Fourth Annual SIGMOD Programming Contest :: A Multidimensional Indexing System

Sorry, the SIGMOD 2012 Programming Contest is over.
Thanks to everyone who participated! Details can be found in the Results section of this site.

Student teams from degree granting institutions are invited to compete in the fourth annual SIGMOD programming contest. This year's task is to implement a multidimensional high-throughput in-memory indexing system. Submissions will be judged based on their overall performance on a variety of different workloads. A shortlist of finalists will be invited to present their implementations at SIGMOD 2012 in Scottsdale, Arizona, USA, where the winning team will be awarded a prize of $5,000 USD. To support the contestants, up to two students from each finalist team will receive travel grants to attend the conference.

News and Updates

Please subscribe to our mailing list to stay up-to-date.

Task overview

The task for this year’s contest is the implementation of a multidimensional high-throughput in-memory index structure that supports common database operations such as point and range queries as well as data manipulation. Application scenarios for such multidimensional indexes are, for example, multimedia indexing, CAD or geospatial data.
The index needs to support transactions and will be queried in parallel by many threads, each one of them issuing one transaction at a time followed by the next one. The index has to fit entirely into the available main-memory and does not require any crash recovery.

We provide a basic interface for index creation, insert, update, delete, point and range queries. As point queries are special range queries, both cases will be handled by one function (optimizations to be handled inside each implementation). All data will be given in the form of a record represented by a multidimensional key and some raw binary data (payload). The workload includes exact- (all index attributes specified) and partial- (subset of index attributes specified) match point queries (conjunctive predicates only), range queries (with ordering), as well as exact-match data manipulations.

The winner will be the submission that completes all of the queries with the smallest average execution time (we will provide a leaderboard that measures the average number of transactions per second) while also passing a set of correctness tests (some of which we will make public).

Important Dates

Dec. 14, 2011 Detailed specifications of the task, API, and requirements available
Mar. 25, 2012 (extended) Contest submission deadline
Apr. 20, 2012 Finalists notification
May 6, 2012 Contest submission deadline for second round (finalists only)
May 20-25, 2012 SIGMOD conference

Task Details

In this section, we give details about the task requirements. To evaluate your solution, we will run a variety of different benchmarks with different workloads on it. There is a default configuration, which we use for the majority of benchmarks, but in order to get a general solution, our benchmark suite includes tests that take each parameter to its limits.

  • Indexes: The benchmark will create multiple indexes that coexist beside each other. In general, the benchmark creates 16 indexes, but this number changes from 1 up to 32.
  • Dimensions: The index must be able to support up to 32 dimensions. The majority of benchmarks will operate with a dimensionality of 4.
  • Duplicates: Duplicate keys are allowed
  • Attribute Types: Your index needs to support three attribute types: INT(4), INT(8) and VARCHAR(512)
  • Number of Tuples: We will test your solution with few (some thousands) and many (some millions or billions) tuples.
  • Data/RAM ratio: The benchmark will store a dimensionality-dependent maximum percentage of the amount of available RAM at the same time in the index.
  • Update Ratio: The default update ratio is 20%, but we change this in a range from 0% to 40%
  • Data/Query Distribution: The workload uses only two types of distributions: The normal distribution as default and the Zipf distribution. We also use correlated and independent (default) data.
  • Transactions: The index has to support transactions with isolation level read committed. This means that non-repeatable reads and phantoms are allowed, but dirty reads are not.
    Each transaction of the benchmark works on a
    single index. The average number of operations that are performed in such a transaction depends on its type, where each operation is exactly one function call (e.g., iterating over 10 keys requires 11 operations).
    There are three possible transaction types:
    • Range queries only (200 operations per transaction)
    • Point queries only (20 operations per transaction)
    • Manipulation queries only (5 operations per transaction)
    Note: The information above just describes the transactions used by our benchmark. Your implementation has to support transactions that contain both reading and modifying operations.
  • Ordering: The output of range and partial-match point queries has to be order preserving (lexicographical byte order for VARCHARs). Assume a partial-match query (a, b = 3, c). The order must be equal to "... ORDER BY a ASC, c ASC" for all keys with b equal to 3.
  • Concurrency: The indexes will be queried by many threads in parallel. Most benchmarks use the number of available hardware threads on the underlying platform, but the number of threads varies from one thread up to twice the number of hardware threads. 
Parameter Minimum Default Maximum
# of indexes 1 16 32
Dimensions 1 4 32
# of tuples 1 - some billions
Size of payload 1 Byte 8 Byte 4096 Byte
Data to RAM - 10% 10%
Update Ratio 0% 20% 40%
# of threads 1 #HWT 2 * #HWT
Isolation level read committed
Duplicate keys yes

Interface Description

The detailed interface is available in the Resources-section.

Benchmark

We provide a configurable benchmark that produces workloads, which consist of three transaction types as stated in the task description. Contestants are able to use this benchmark for their private evaluation. However, we use the same benchmark driver for the leaderboard and the final evaluation. The leaderboard driver will use the given default values, while the final evaluation additionally uses some other common workloads and workloads that take each individual parameter to its limit.

Submitting Your Implementation

While we encourage you to test your implementation locally by providing a basic benchmark and a set of unit tests, we offer an online evaluation system allowing you to run the benchmark on our test machine. Of course we only run the public benchmark, which consists of smaller and less critical performance tests.

We also provide a leaderboard to help you to keep track of your progress and compare the performance of your implementation with implementations by other contestants.

Test hardware specifications:
Your implementation will be evaluated on a 64-bit Dual E5645 (each consisting of 6 cores, which makes a total of 12 physical cores) x86 machine running Ubuntu 10.04. The machine has 48 GB of RAM; however, certain tests will be limited to far less memory. See the table below for details of the evaluation platform.

ProcessorIntel Xeon E5645
Processor Speed2.40 GHz
Configuration2 processors (12 cores total)
L1 Cache Size64 KB/core
L2 Cache Size256 KB/core
L3 Cache Size2x 12MB
Main Memory48 GB
Hyper-ThreadingEnabled
Operating SystemUbuntu 10.04.3 LTS (kernel version 2.6.32-33-server)

In order to run your code on our machine as well as for your final submission, you are required to submit a shared 64-bit linux compatible library (*.so), which implements the given interfaces. We will link our test drivers against this library and measure the overall execution time for each workload.

For the final submission we additionally need the following things:

  • The complete source code of your implementation for our code reviews
  • A makefile that produces a shared library using the GCC, just by running the make command
  • A README file that contains the names of all team members, the institute where they are studying and your contact information (E-Mail).

You can submit your implementations using your dashboard. The submission deadline is March 25 at 23:59 UTC.
The submission deadline for the second round (finalists only) is May 6 at 23:59 UTC.

Contest rules

The ACM SIGMOD 2012 Programming Contest is open to undergraduate and graduate students all over the world. A team may be formed by one or more students, who may not necessarily be enrolled at the same institution. All team members must be actively enrolled as a student at an accredited degree-granting institution at the beginning of the competition (Dec. 14, 2011). Several teams from the same institution can compete independently. Students associated with the organizers are not eligible to enter. Students who are enrolled at TU-Dresden and still want to join the competition, please contact Thomas Kissinger.

All teams have to supply their source codes, in order to allow us to review the code, and agree to license their code under the MIT License. The finalist implementations will be made public on this website. Teams may choose to use third-party libraries, as long as they do not implement core functions of the system and are compatible with the MIT License (So using a hash map implementation might be allowed, but using a prebuilt indexing system is not). Be aware that the usage of a third-party library can lead to a disqualification, if one of these rules is violated.

Organizers

This contest is organized by the Database Technology Group at TU-Dresden in collaboration with the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).
For questions concerning the contest, especially the task, please subscribe to our mailing list and send your question or concern to sigmod12contest@groups.tu-dresden.de (mailing list archive). For questions that are not intended for public contact Thomas Kissinger.

Promotion Material

A poster that can be used to promote the contest at universities is available here:
ACM SIGMOD 2012 Programming Contest Poster (DIN A3)

It is intended for A3 printing, but will still be readable on a single page A4 sized paper.

We have also created a version that is optimized for printing on letter-sized paper:
ACM SIGMOD 2012 Programming Contest Poster (US Letter)

Sponsorship

The contest is supported by a grant from the National Science Foundation. Prizes are gracefully donated by SAP (SAP HANA) and Microsoft.

Last Update: September 19, 2012 11:11 CEST
Author Database Technology Group, Technische Universität Dresden

Contest Contact
Research Assistant
Dipl.-Ing.
Thomas Kissinger



Tel.: +49 351 463-38589
Fax: +49 351 463-38259
Student Research Assistant
Lukas M. Maas

Contact
Head
Prof. Dr.-Ing.
Wolfgang Lehner


Tel.: +49 351 463-38383
Fax: +49 351 463-38259


Visiting Address (View)
Nöthnitzer Str. 46
Room 3109
01187 Dresden

Postal Address
Technische Universität Dresden
Dep. of Computer Science
Institute for System Architecture
Database Technology Group
01062 Dresden
Germany