September 15, 2024

Using Network Compression with MongoDB in Elixir

In this article, I would like to explain what are the effects of using network compression with the MongoDB.

A parameter in the URL switches the compression of the network connection on and off. Compression is switched off by default.

The Elixir MongoDB driver supports two different compression types:

zlib compression works out of the box and is part of the Erlang beam. The compression rate is high, but the CPU load is also higher.

zstd was developed by Facebook, offers an almost identical compression rate as zlib, but the CPU utilisation is lower. zstd is therefore a good choice, although you have to use an external library with bindiung for C implementation.

To activate zstd compression, simply add {:ezstd, "~> 1.1"} to the dependencies of your mix.exs file. The driver will provide the related code. After activating the zstd compressor can be used by appending the compressors=zstd to the URL connection string:

{:ok, top} = Mongo.start_link(url: "mongodb://localhost:27017/my_database?compressors=zstd")

That was quite simple and the article would already be over now. Ultimately, this article is more about the question:

Why should you use network compression?

This question becomes interesting when you have to pay for network data. On the one hand, there are offers for consumers that have different transmission speed limits. A distinction is made between upload and download. The faster the guaranteed speed, the more expensive it is.

On the other hand, you pay for cloud providers’ servers by volume. A distinction is usually made between public and private data traffic. Assuming you have rented a dedicated server in the cloud and a MongoDB in the Atlas cloud, it is often the case that this represents public data traffic and therefore you pay for the network volume.

In this case, network compression becomes interesting.

To measure the side effects on our server, we run a test loading and fetching data from a local database. We measure the time and the input/output traffic to determine the compression rate. The rate naturally depends on the individual data and must therefore always be repeated with your own data. You should also analyse your own use cases:

def benchmark() do
  {{:input, input}, {:output, output}} = :erlang.statistics(:io)
  {t, _result} = :timer.tc(fn ->
    for _i <- 1..5 do
      Mongo.find(:onsen_db, "tasks", %{}, limit: 30_000, batch_size: 1000) |> Stream.reject(fn _x -> true end) |> Stream.run()
    end
  end)
  {{:input, input_1}, {:output, output_1}} = :erlang.statistics(:io)
  {input_1 - input, output_1 - output, t / 1_000}
end

Without compression, the code takes 8 seconds. 322MB are received from the database and approx. 30k bytes are sent. With zlib compression, the code takes 12 seconds, sends 38k bytes and receives 55MB. We save about 83% bytes when receiving the data from the database. Surprisingly, we send more bytes.

With zstd compression, the code takes 8.5 seconds, sends 35k bytes and receives 49MB. In this case, we save about 85% bytes. Again, the code sends more bytes.

The slight increase in the amount of data sent can be explained by the fact that we receive the data in batches. The request for the next batch is probably so small that the compression requires more data.

Conclusion

  • The activation of network compression only requires one parameter.
  • zstd compression is a good compromise between compression rate and speed, but requires additional NIF bindings.
  • We can theoretically save 80% data traffic.
  • Network compression is a quick-win for applications where the database traffic has to be paid for.