Abstract

The HANA database contains several highly optimized evaluation algorithms on columnar data. The choice of the right operator implementation today is largely done based on heuristics. However, as the basis for a truly cost-based query optimizer the query evaluation time for all HANA query operators needs to be analyzed.

The goal of the thesis is to develop a test harness which can be used to examine the runtime characteristics of the HANA column store. The generated measurements will be used in a regression library to derive the coefficients for the cost formulas. The type of cost function, e.g. linear, multi-variate, will be based on a code-level analysis of the operators. This cost formulas are especially important as we aim at deploying HANA on e.g. non-volatile memory instead of DRAM.

Background

Estimating the cost of a query execution plan is standard for any major database product today. Cost estimation requires that the cost (and cardinality estimates) for the operators used in a query execution plan is known.

There is little published research on the method for deriving cost functions. Linear and multivariate regression analysis known in statistics or analysis of the algorithmic complexity of operators are generic techniques available. Research was published on analyzing access to disk or to DRAM, but no analysis is available for NVRAM.