Knowledge bases and taxonomies have become a valuable resource in a wide range of applications involving data understanding and reasoning. Automatic information extraction and scalable graph construction approaches make it possible to build very large knowledge bases from various sources, such as the Web. Large knowledge bases with high coverage are especially important for open domain applications.
However, the extraction of concept relations, as well as the construction of a consistent knowledge graph are very expensive and time consuming.
The objective of this thesis is to develop an efficient extraction and integration process, that enables the construction of a large IsA-knowledge base from various types of Web data and scales well with respect to the size of the data source (and, therefore, the number of extracted concept relations). The student is encouraged to develop a process based on the Map/Reduce model.