Summary

Within the context of in-memory database systems, lightweight compression algorithms play an important role for achieving efficient storage and processing big amounts of data in main memory. Compared to traditional compression algorithms, for instance Huffman or LZW, lightweight compression algorithms attain comparable good compression rates by using context knowledge. Additionally, they permit faster compression as well as decompression. In recent years, the diversity of lightweight compression algorithms increased, because it is necessary to optimize algorithms regarding special context knowledge. In order to manage that diversity we thought about modularization of lightweight compression algorithms and developed a general modular compression scheme. By changing particular modules or just incoming parameters we can easily built a lot of different algorithms.

The input that needs to be compressed is a potential infinite sequence of data values. The output is a potential infinite sequence of compressed data packages. Our general compression scheme consists of four modules. The first module is a word generation component. It decides which finite initial sequence will be processed by the following modules. The tail of the input sequence will also be processed by the word generation component. For each output of the word generation component the parameter calculation component defines parameters. In the simplest case the third module is an encoder which usually codes a value with the knowledge of the calculated parameter. The last module assembles all coded values and at least the parameters that are needed for decompression. In the complex case we don’t have an encoder at this place but rather pass through the scheme one or several times more for a finer-grained subdivision.

General compression concepts like frame of reference, delta encoding, dictionary encoding, run length encoding etc. can be expressed as scheme patterns which are specified in well defined algorithms. The compression scheme is suitable for the modularization of a multitude of compression algorithms. Often some modules and module arrangements appear in different algorithms. For example the binary packing pattern that appears in all Simple- and PFOR-based algorithms. Using modularization, we can also create new algorithms that don’t exist yet and we can pursue the idea of compression algorithm engineering which could be very useful in the case we have new context information.

In future work we have to implement this generic scheme, special algorithms based on that scheme, and compare this new kind of implementation with existing ones. An interesting idea is also the automatic construction of algorithms based on context knowledge.

More