Motivation
In-memory column-stores make extensive use of lightweight data compression techniques such as dictionary coding, null suppression and run length encoding to address the increasingly severe bottleneck between main memory and fast multi-core processors. The reduced size of compressed data results in a better utilization of the main memory bandwidth and the cache hierarchy. Furthermore, many database operators can directly process compressed data without prior decompression. Consequently, the employment of lightweight data compression can significantly improve query performance.
In main memory-centric column-stores, accessing the intermediate results during query processing is as expensive as accessing the base data, since both reside in main memory. Thus, compression is promising not only for the base data, but also for intermediates. However, existing systems do not fully exploit the potential of compressed intermediates: During query processing, they keep the data compressed as long as possible, but once the data has been decompressed for a certain operator, they do not recompress it again, due to the implied computational overhead. However, using modern hardware and state-of-the-art lightweight compression algorithms, this computational overhead can be outweighed by the benefits of compressed data.
Our Vision
Our vision is a balanced query processing based on compressed intermediates in a main memory-centric column store. That is, in a query execution plan of compression-aware physical operators, every intermediate result shall be represented using a suitable lightweight compression algorithm which is selected in a compression-aware query optimization such that the benefits of compression outweigh its costs.
To achieve this goal, this research project addresses three aspects of the problem: the structural aspect, the operational aspect, and the optimization aspect.
Structural Aspect
The structural aspect lays the foundations of this research project by focusing on the basics of lightweight data compression. In particular, we focus on
- highly efficient implementations of existing lightweight compression algorithms exploiting features of modern processors such as single instruction multiple data (SIMD) instruction set extensions
- highly efficient transformations of the data in the compressed representation of one compression algorithm to the compressed representation of another compression algorithm
- empirical and theoretical investigation of the functional and non-functional properties of lightweight compression and transformation algorithms subject to the characteristics of the data to be processed
Operational Aspect
In the operational aspect, we investigate how to integrate compression into the query execution. Thereby, we utilize the efficient compression and transformation algorithms from the structural aspect. Our research interests include
- an appropriate processing model for compressed intermediates
- physical database operators on compressed data
- different degrees of the integration of compression into query execution and the resulting trade-offs between performance and code complexity
Optimization Aspect
There is no single-best compression algorithm, but the decision always depends on the data characteristics. Thus, compression must be employed wisely in a query plan in order to make its benefits outweigh its computational overhead. Based on our understanding of the algorithms’ and operators’ properties obtained in the structural and operational aspect, we develop and investigate
- cost models for compression and transformation algorithms as well as for physical operators on compressed data
- compression-aware strategies for the database query optimizer
Related Publications
@inproceedings{,
author = {Annett Ungeth\"{u}m and Patrick Damme and Johannes Pietrzyk and Alexander Krause and Dirk Habich and Wolfgang Lehner},
title = {Balancing Performance and Energy for Lightweight Data Compression Algorithms},
booktitle = {New Trends in Databases and Information Systems - ADBIS 2017 Short Papers and Workshops, AMSD, BigNovelTI, DAS, SW4CH, DC, Nicosia, Cyprus, September 24-27, 2017, Proceedings},
year = {2017},
month = {9},
pages = {37--44},
numpages = {8},
url = {https://doi.org/10.1007/978-3-319-67162-8_5},
publisher = {Springer}
}@inproceedings{,
author = {Patrick Damme},
title = {Query Processing Based on Compressed Intermediates},
booktitle = {Proceedings of the VLDB 2017 PhD Workshop co-located with the 43rd International Conference on Very Large Databases (VLDB 2017), Munich, Germany, August 28, 2017.},
series = {CEUR Workshop Proceedings},
volume = {1882},
year = {2017},
month = {8},
url = {http://ceur-ws.org/Vol-1882/paper05.pdf},
publisher = {CEUR-WS.org}
}@inproceedings{,
author = {Patrick Damme and Dirk Habich and Juliana Hildebrandt and Wolfgang Lehner},
title = {Lightweight Data Compression Algorithms: An Experimental Survey (Experiments and Analyses).},
booktitle = {Proceedings of the 20th International Conference on Extending Database Technology, EDBT 2017, Venice, Italy, March 21-24, 2017.},
year = {2017},
month = {3},
pages = {72--83},
numpages = {10},
url = {https://doi.org/10.5441/002/edbt.2017.08},
publisher = {OpenProceedings.org}
}@inproceedings{,
author = {Patrick Damme and Dirk Habich and Juliana Hildebrandt and Wolfgang Lehner},
title = {Insights into the Comparative Evaluation of Lightweight Data Compression Algorithms.},
booktitle = {Proceedings of the 20th International Conference on Extending Database Technology, EDBT 2017, Venice, Italy, March 21-24, 2017.},
year = {2017},
month = {3},
pages = {562--565},
numpages = {4},
url = {https://doi.org/10.5441/002/edbt.2017.70},
publisher = {OpenProceedings.org}
}@article{,
author = {Juliana Hildebrandt and Dirk Habich and Patrick Damme and Wolfgang Lehner},
title = {Model kit for lightweight data compression algorithms},
booktitle = {Proceedings of the 19th International Conference on Extending Database Technology, {EDBT} 2016, Bordeaux, France, March 15-16, 2016, Bordeaux, France, March 15-16, 2016},
year = {2016},
month = {3},
pages = {692--693},
numpages = {2},
url = {10.5220/0006009301890194},
crossref = {DBLP:conf/edbt/2016}
}@conference{,
author = {Juliana Hildebrandt and Dirk Habich and Thomas K\"{u}hn and Patrick Damme and Wolfgang Lehner},
title = {Metamodeling Lightweight Data Compression Algorithms and its Application Scenarios},
booktitle = {Proceedings of the ER Forum 2017 and the ER 2017 Demo Track co-located with the 36th International Conference on Conceptual Modelling (ER 2017), Valencia, Spain, - November 6-9, 2017.},
year = {2016},
month = {1},
location = {Valencia, Spain},
pages = {128--141},
numpages = {14},
publisher = {CEUR-WS.org}
}@article{,
author = {Juliana Hildebrandt and Dirk Habich and Patrick Damme and Wolfgang Lehner},
title = {Compression-Aware In-Memory Query Processing: Vision, System Design, and Beyond},
booktitle = {Proceedings of the 2016 Joint Workshop on Accelerating Analytics and In-Memory Data Management Systems, LNCS},
year = {2016}
}@inproceedings{,
author = {Patrick Damme and Dirk Habich and Wolfgang Lehner},
title = {A Benchmark Framework for Data Compression Techniques},
booktitle = {Performance Evaluation and Benchmarking: Traditional to Big Data to Internet of Things - 7th TPC Technology Conference, TPCTC 2015, Kohala Coast, HI, USA, August 31 - September 4, 2015. Revised Selected Papers},
series = {Lecture Notes in Computer Science},
volume = {9508},
year = {2015},
month = {8},
pages = {77--93},
numpages = {17},
url = {http://dx.doi.org/10.1007/978-3-319-31409-9_6},
publisher = {Springer}
}@article{,
author = {Till Kolditz and Dirk Habich and Patrick Damme and Wolfgang Lehner and Dmitrii Kuvaiskii and Christof Fetzer},
title = {Resiliency-aware Data Compression for In-memory Database Systems},
booktitle = {DATA 2015 - Proceedings of 4th International Conference on Data Management Technologies and Applications, Colmar, Alsace, France, 20-22 July, 2015.},
year = {2015},
isbn = {978-989-758-103-8},
pages = {326--331},
url = {http://dx.doi.org/10.5220/0005557303260331},
publisher = {Helfert, Markus; Holzinger, Andreas; Belo, Orlando \& Francalanci, Chiara}
}@article{,
author = {Dirk Habich and Patrick Damme and Wolfgang Lehner},
title = {Optimierung der Anfrageverarbeitung mittels Kompression der Zwischenergebnisse},
booktitle = {Datenbanksysteme f\"{u}r Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs Datenbanken und Informationssysteme"' (DBIS)},
volume = {241},
year = {2015},
isbn = {978-3-88579-635-0},
pages = {259--278},
url = {http://subs.emis.de/LNI/Proceedings/Proceedings241/article42.html}
}@article{,
author = {Juliana Hildebrandt and Dirk Habich and Patrick Damme and Wolfgang Lehner},
title = {Modularisierung leichtgewichtiger Kompressionsalgorithmen},
booktitle = {Proceedings of the 27th GI-Workshop Grundlagen von Datenbanken, Gommern, Germany, May 26-29, 2015.},
series = {CEUR Workshop Proceedings},
volume = {1366},
year = {2015},
pages = {54--59},
url = {http://ceur-ws.org/Vol-1366/paper11.pdf},
publisher = {Saake, Gunter; Broneske, David; Dorok, Sebastian \& Meister, Andreas}
}Direct Transformation Techniques for Compressed Data: General Approach and Application Scenarios.
@article{,
author = {Patrick Damme and Dirk Habich and Wolfgang Lehner},
title = {Direct Transformation Techniques for Compressed Data: General Approach and Application Scenarios},
booktitle = {Advances in Databases and Information Systems - 19th East European Conference, ADBIS 2015, Poitiers, France, September 8-11, 2015, Proceedings},
series = {Lecture Notes in Computer Science},
volume = {9282},
year = {2015},
isbn = {978-3-319-23134-1},
pages = {151--165},
url = {http://dx.doi.org/10.1007/978-3-319-23135-8_11},
publisher = {Morzy, Tadeusz; Valduriez, Patrick \& Bellatreche, Ladjel}
}Changing the Compression Scheme during Query Processing in Main Memory-centric Column-Stores
Johann Hertrampf December 1st, 2016 until April 19th, 2017
Project ThesisSupervision: Patrick Damme, Dirk Habich
Johannes Pietrzyk February 1st, 2017 until August 1st, 2017
Project ThesisSupervision: Patrick Damme, Annett Ungethüm
Related Student Theses
Juliana Hildebrandt September 24th, 2014 until March 24th, 2015
Diplom ThesisSupervision: Dirk Habich, Patrick Damme
Lightweight compression algorithms and interfaces for flexible storage architectures
Paul Peschel July 1st, 2015 until January 9th, 2015
Master ThesisSupervision: Thomas Kissinger, Patrick Damme
Lightweight Techniques for Compression and Transformation
Patrick Damme January 1st, 2014 until December 4th, 2014
Master ThesisSupervision: Dirk Habich