GitHub Page
Visit our GitHub page: https://brics-db.github.io
Background
The key objective of database systems is to reliably manage data, while high query throughput and low query latency are core requirements [1]. To satisfy these requirements for a constantly increasing amount of data, database systems constantly adapt to new hardware features [2, 3, 4, 5, 6, 7], for instance: new instruction sets, increasing core counts, changing core/cache topologies, increasing DRAM bandwidths, or new persistence technologies (nvRAM) [8, 9, 10]. These advances come with a backdraw, though: for a long time it has been known that hardware is subject to soft and hard errors [11, 12, 13]. Soft errors are also called bit flips, which may occur due to cosmic rays, heat, hardware aging, or electrical crosstalk, which, in turn, is due to the ongoing miniaturization of integrated curcuits [11, 14]. Hardware aging even leads to increasing error rates during a system’s run-time. Despite increasing error rates, database research could focus on improving performance (higher throughput, lower latency) by leveraging hardware improvements, without considering any side effects.
So far, this was possible because soft errors were masked by hardware, i.e. either did not propagate to the software layer, or lead to process or system crashes. Server-grade hardware like ECC DRAM can correct single bit flips and detect double bit flips. However, the picture has already changed, as large-scale systems are yet suffering from the increasing error rates. Scaling up today’s hardware detection and correction capabilities is not always sensible and leads to high overheads in terms of additional code space (memory area) and coding coplexity and latency (multiple or more complex codes). Consequently, error resilience becomes a major challenge for both hardware and software system designers and in the last couple of years researchers gained the insight that a cross-layer approach is required for tackling hardware errors [15]. You could say, that, this opens a very interesting and and challenging new research area.
The idea is that each layer in the hardware/software stack detects and corrects those errors, for which it is better suited than other layers. For the database domain, this requires novel approaches, as resilience was mainly left to the hardware and operating system layers. For instance, database systems could use context knowledge about data types, algorithms, internal data structures, and inherent redundancy to detect and correct hardware errors when and where it is sensible.
Our Vision
We strive to develop methodologies to detect and correct certain types of hardware errors inside a database. Since modern database systems keep most if not all of the relevant business data in main memory, we concentrate on main memory-centric columnstores. We currently have a paper submission which includes lots of our findings, which we will share here after acceptance. So, stay tuned!
Interactive AN-Coding Analysis Tool
Related People
The research work and some of the publications were done in co-work with the following people:
- Matthias Werner, https://tu-dresden.de/zih/die-einrichtung/struktur/matthias-werner
- Dmitrii Kuvaiskii, https://tu-dresden.de/ing/informatik/sya/se/die-professur/beschaeftigte/dmitrii-kuvaiskii
Related Publications
@inbook{,
author = {Matthias Werner and Till Kolditz and Tomas Karnagel and Dirk Habich and Wolfgang Lehner},
title = {Multi-GPU Approximation for Silent Data Corruption of AN Codes},
booktitle = {Further Improvements in the Boolean Domain},
year = {2018},
month = {1},
isbn = {978-1-5275-0371-7},
url = {http://www.cambridgescholars.com/further-improvements-in-the-boolean-domain},
publisher = {Cambridge Scholars Publishing}
}@article{,
author = {Matthias Werner and Till Kolditz and Tomas Karnagel and Dirk Habich and Wolfgang Lehner},
title = {Multi-GPU Approximation Methods for Silent Data Corruption of AN-Coding},
booktitle = {12th International Workshop on Boolean Problems, IWSBP 2016, Freiberg, Germany},
year = {2016},
month = {9}
}@incollection{10.1007/978-3-319-30162-4,
author = {Till Kolditz and Dirk Habich and Dmitrii Kuvaiskii and Wolfgang Lehner and Christof Fetzer},
title = {Needles in the haystack \&\#8211; Tackling Bit Flips in Lightweight Data Compression},
booktitle = {Data Management Technologies and Applications},
series = {Communications in Computer and Information Science},
volume = {584},
year = {2016},
isbn = {978-3-319-30162-4},
pages = {135--153},
numpages = {9},
url = {http://www.springer.com/de/book/9783319301617},
publisher = {Springer International Publishing}
}@article{,
author = {Till Kolditz and Dirk Habich and Patrick Damme and Wolfgang Lehner and Dmitrii Kuvaiskii and Christof Fetzer},
title = {Resiliency-aware Data Compression for In-memory Database Systems},
booktitle = {DATA 2015 - Proceedings of 4th International Conference on Data Management Technologies and Applications, Colmar, Alsace, France, 20-22 July, 2015.},
year = {2015},
isbn = {978-989-758-103-8},
pages = {326--331},
url = {http://dx.doi.org/10.5220/0005557303260331},
publisher = {Helfert, Markus; Holzinger, Andreas; Belo, Orlando \& Francalanci, Chiara}
}@article{,
author = {Till Kolditz and Benjamin Schlegel and Dirk Habich and Wolfgang Lehner},
title = {Online Bit Flip Detection for In-Memory B-Trees Live!},
booktitle = {Datenbanksysteme f\"{u}r Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs Datenbanken und Informationssysteme"' (DBIS)},
volume = {241},
year = {2015},
isbn = {978-3-88579-635-0},
pages = {675--678},
url = {http://subs.emis.de/LNI/Proceedings/Proceedings241/article3.html}
}@article{,
author = {Till Kolditz and Thomas Kissinger and Benjamin Schlegel and Dirk Habich and Wolfgang Lehner},
title = {Online bit flip detection for in-memory B-trees on unreliable hardware},
booktitle = {Proceedings of the Tenth International Workshop on Data Management on New Hardware},
series = {DaMoN '14},
year = {2014},
month = {6},
isbn = {978-1-4503-2971-2},
location = {Snowbird, Utah},
pages = {5},
numpages = {9},
url = {http://doi.acm.org/10.1145/2619228.2619233},
acmid = {2619233},
publisher = {ACM},
address = {New York, NY, USA}
}@article{,
author = {Wolfgang Lehner and Matthias B\"{o}hm and Christof Fetzer},
title = {Resiliency-Aware Data Management},
journal = {PVLDB},
volume = {4},
year = {2011},
pages = {1462--1465},
url = {http://www.vldb.org/pvldb/vol4/p1462-boehm.pdf}
}Untersuchung von SSDs für den Einsatz in Datenbankmanagementsystemen
Konrad Gube January 1st, 2015 until October 28th, 2015
Project ThesisSupervision: Till Kolditz
Analyse eines Forschungsthemas: Resilience – Stand und Ausblick im Datenbankenumfeld
Florian Weigelt May 19th, 2014 until September 1st, 2014
Project ThesisSupervision: Till Kolditz
- D. Abadi et al. The beckman report on database research. Commun. ACM, 59(2):92–99, 2016.
- P. A. Boncz, M. L. Kersten, and S. Manegold. Breaking the memory wall in monetdb. Commun. ACM, 51(12):77–85, 2008.
- S. Breß, H. Funke, and J. Teubner. Robust query processing in co-processor-accelerated databases. In SIGMOD, pages 1891–1906, 2016.
- J. Do, Y. Kee, J. M. Patel, C. Park, K. Park, and D. J. DeWitt. Query processing on smart ssds: opportunities and challenges. In SIGMOD, pages 1221–1230, 2013
- T. Karnagel, D. Habich, and W. Lehner. Adaptive work placement for query processing on heterogeneous computing resources. PVLDB, 10(7):733–744, 2017.
- F. Li, S. Das, M. Syamala, and V. R. Narasayya. Accelerating relational databases by leveraging remote memory and RDMA. In SIGMOD, pages 355–370, 2016.
- I. Oukid, J. Lasperas, A. Nica, T. Willhalm, and W. Lehner. Fptree: A hybrid SCM-DRAM persistent and concurrent b-tree for storage class memory. In SIGMOD, pages 371–386, 2016.
- S. Borkar and A. A. Chien. The future of microprocessors. Commun. ACM, 54(5):67–77, 2011
- J. Henkel. Emerging memory technologies. IEEE Design & Test, 34(3):4–5, 2017.
- F. J. Pollack. New microarchitecture challenges in the coming generations of CMOS process technologies. In Symposium on Microarchitecture, page 2, 1999.
- S. Y. Borkar. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro, 25(6):10–16, 2005.
- J. Henkel, L. Bauer, N. Dutt, P. Gupta, S. R. Nassif, M. Shafique, M. B. Tahoori, and N. Wehn. Reliable on-chip systems in the nano-era: lessons learnt and future trends. In DAC, pages 99:1–99:10, 2013
- M. Spica and T. M. Mak. Do we need anything more than single bit error correction (ecc)? In MTDT, pages 111–116, 2004
- A. A. Hwang, I. A. Stefanovici, and B. Schroeder. Cosmic rays don’t strike twice: understanding the nature of DRAM errors and the implications for system design. In ASPLOS, pages 111–122, 2012.
- S. Rehman, M. Shafique, and J. Henkel. Reliable Software for Unreliable Hardware – A Cross Layer Perspective. Springer, 2016.