Christian Bizer


2016

pdf
A Large DataBase of Hypernymy Relations Extracted from the Web.
Julian Seitner | Christian Bizer | Kai Eckert | Stefano Faralli | Robert Meusel | Heiko Paulheim | Simone Paolo Ponzetto
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Hypernymy relations (those where an hyponym term shares a “isa” relationship with his hypernym) play a key role for many Natural Language Processing (NLP) tasks, e.g. ontology learning, automatically building or extending knowledge bases, or word sense disambiguation and induction. In fact, such relations may provide the basis for the construction of more complex structures such as taxonomies, or be used as effective background knowledge for many word understanding applications. We present a publicly available database containing more than 400 million hypernymy relations we extracted from the CommonCrawl web corpus. We describe the infrastructure we developed to iterate over the web corpus for extracting the hypernymy relations and store them effectively into a large database. This collection of relations represents a rich source of knowledge and may be useful for many researchers. We offer the tuple dataset for public download and an Application Programming Interface (API) to help other researchers programmatically query the database.

2012

pdf
Evaluating the Impact of Phrase Recognition on Concept Tagging
Pablo Mendes | Joachim Daiber | Rohana Rajapakse | Felix Sasaki | Christian Bizer
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We have developed DBpedia Spotlight, a flexible concept tagging system that is able to annotate entities, topics and other terms in natural language text. The system starts by recognizing phrases to annotate in the input text, and subsequently disambiguates them to a reference knowledge base extracted from Wikipedia. In this paper we evaluate the impact of the phrase recognition step on the ability of the system to correctly reproduce the annotations of a gold standard in an unsupervised setting. We argue that a combination of techniques is needed, and we evaluate a number of alternatives according to an existing evaluation set.

pdf
DBpedia: A Multilingual Cross-domain Knowledge Base
Pablo Mendes | Max Jakob | Christian Bizer
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The DBpedia project extracts structured information from Wikipedia editions in 97 different languages and combines this information into a large multi-lingual knowledge base covering many specific domains and general world knowledge. The knowledge base contains textual descriptions (titles and abstracts) of concepts in up to 97 languages. It also contains structured knowledge that has been extracted from the infobox systems of Wikipedias in 15 different languages and is mapped onto a single consistent ontology by a community effort. The knowledge base can be queried using the SPARQL query language and all its data sets are freely available for download. In this paper, we describe the general DBpedia knowledge base and as well as the DBpedia data sets that specifically aim at supporting computational linguistics tasks. These task include Entity Linking, Word Sense Disambiguation, Question Answering, Slot Filling and Relationship Extraction. These use cases are outlined, pointing at added value that the structured data of DBpedia provides.