Compressing large data sets in Redis with Zlib - Ruby test case

Content

Introduction
Approach
Hardware
Memory gain due to compression
The read-only test case
The write-only test case
Read-write test case
Overall summary

Introduction

Year and a half ago, I was working with a software that used Redis as a buffer to store large sets of text data. We had some bottlenecks there. One of them was related to Redis and the large amount of data, that we had there (large comparing to RAM amount). Since then, I've wanted to check if using Zlib library would be a big improvement or would it be just a next bottleneck (CPU). Unfortunately I don't have access to this software any more, that's why I've decided to create a simple test case just to check this matter.

Approach

We mostly did read operations from Redis (the data was inserted and it was used many times through the cache life cycle), however I've wanted to test as many cases as I could, that's why I've tested following cases (each of them is tested with and without compression):

Read only operations
Write only operations
Read/write operations

I've also considered given data sets:

empty file source
small file source (16.6 kB)
medium file source (47.5 kB)
big file source (221.9 kB)
large file source (4.2 MB)

To make it quite accurate I've tested it 10 times for given case and also for different number of iterations (from 1 to 1000). You can check the results on charts below.

Hardware

Hardware used was rather simple:

2 GB RAM
Pentium 4 HT
2TB SATA WD Disk
Unstressed Debian Linux on board

Memory gain due to compression

The memory gain is quite obvious - it depends on the compression level that we can achive using Zlib library. Below you can see the memory usage before and after compression - for all the given example files:

So, as you can see, except the first case, with the empty file, we gain around 75-85% of space, when using the Zlib compression of text data. In my opinion this is a huge improvement.

The read-only test case

First let's see how the read-only performance decreases due to the compression mechanism that was implemented. On the chart we can see both - the non-compressed and compressed script versions, both for given (same) number of iterations. Each chart represents one file size.

Empty file

For empty file, the speed decrease is around 6%.

Small file

For small file, the speed decrease is around 26%. But still - 1000 reads take us less than a half of second.

Medium file

Here the difference is around 47.5% - the bigger the file gets, the bigger the decompression overhead is. Since the read operations from Redis are quite fast, the decompression time seems to affect the whole process in a quite high level.

Big file

Diffence is around 55%. Which is still tolerable. Of course if we would do a lot of readings that should be performed really fast - this would be a problem.

Large file

Difference is around 33.5%, which is quite weird, since it is lower than with the previous smaller file.

Read-only summary

As you can see above, the decompression overhead is around 30-50%. Is it much? It depends what you are doing. If you are ready to exchange such a speed loss for a memory saving, this might be the solution for you (of course for now we are talking about the read-only scenario).

The write-only test case

Empty file

Difference is around 29%. Seems like the compression impact comparing to decompression is way bigger (nearly 5 times). This doesn't look good for the write-only test case, however let's see more examples.

Small file

Difference is around 161%. This is huge difference - even with small data amounts, compression seams to really influence the write process.

Medium file

Difference is around 520%. No comments here - this level is way beyond the heavy duty usage tolerance level. With such a difference, using compression on heavy-write system would be

Big file

Difference is around 1063%.

Large file

Difference is around 675%.

Write-only summary

It seems like the heavy duty write-only usage with Zlib compression won't be a good idea. The time spent compressing the input data is way beyond any acceptable level. Write process increases so much, that it would be way better to use storage that has compression already implemented (or one that is not in-memory).

Read-write test case

This is the last test case. I've performed one write per 1 read.

Empty file

Difference is around 19.8%.

Small file

Difference is around 104%.

Medium file

Difference is around 350%.

Big file

Difference is around 667%.

Large file

Difference is around 402%.

Read-write summary

It looks like the write process slowed the whole program in a big way. This is a second case that shows, that we should not use this approach in a heavy-write environment

Overall summary

First of all, if you have such a problem at the beginning of your software development, probably you should use a different data-storage (if you plan to have more data than you can handle in memory). This article and the compression "tweak" that I've tried to check/use should be implemented only when you don't have any other option and the Redis memory usage is your biggest bottleneck. Also, you should not try to use this idea when you do a lot of writing to Redis and a only few reads per data chunk. This would slow down your software dramatically. Of course if you do less than few hundred operations per second, you probably would not feel this, but still, try to think about future issues as well.

On the other hand, it seems that when we have an application that uses Redis mostly for reading stuff from it, we might think about compressing the input data. However as I mentioned before - try not to overdo it and implement such a mechanism only when you really need it.

To be honest, I'm a bit disappointed. I thought, that using the compression would not be such a big overhead, however it seems like Redis is way faster than I thought comparing to compression process.

2 Comments

Lionel Bouton
April 1, 2013 — 03:55

You should have a look at compression algorithm tuned for speed like snappy or lzo.
See https://github.com/miyucy/snappy or https://github.com/expectedbehavior/lzo-ruby for Ruby interfaces.
Mayur Rustagi
April 2, 2013 — 17:41

Not fair. Use LZO and youll only see improvements. Alleast use a block compression like bzip2.

1 Pingback

Redis with Node.js: Compress to Use Redis Memory Efficiently - Daniel Auener