About
Word embeddings encode a lot of semantic as well as syntactic features and therefore are useful in many tasks especially in Natural Language Processing and Information Retrieval. To exhibit the rich information stored in word embeddings and utilize it for use cases in relational database systems we propose FREDDY (Fast woRd EmbedDings Database sYstems) an extended relational database system based on PostgreSQL. We introduce a wide range of UDFs forming the base for novel query types allowing the user to analyze structured knowledge in the database relations together with huge, unstructured text corpora encoded as word embeddings. Supported by different index structures and approximation techniques these operations are able to perform fast similarity computations on high-dimensional vector spaces. A web application makes it possible to explore these novel query capabilities on different database schemes and a variety of word embeddings generated on different text corpora. From a systems perspective the user is able to examine the impact of multiple approximation techniques and their parameters for similarity search on query execution time and precision.
Demonstrator Screencast
FREDDY on GitHub
RETRO: Relation Retrofitting For In-Database Machine Learning on Textual Data
Related Publications
@conference{,
author = {Michael G\"{u}nther and Maik Thiele and Wolfgang Lehner},
title = {Fast Approximated Nearest Neighbor Joins For Relational Database Systems},
booktitle = {18. Fachtagung f\"{u}r "'Datenbanksysteme f\"{u}r Business, Technologie und Web},
year = {2019},
month = {3}
}@conference{,
author = {Michael G\"{u}nther and Zdravko Yanakiev and Maik Thiele and Wolfgang Lehner},
title = {Explore FREDDY: Fast Word Embeddings in Database Systems},
year = {2019},
month = {3}
}@conference{,
author = {Michael G\"{u}nther},
title = {FREDDY: Fast Word Embeddings in Database Systems},
booktitle = {Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18)},
year = {2018},
month = {6}
}