Max Jakob
2012
DBpedia: A Multilingual Cross-domain Knowledge Base
Pablo Mendes
|
Max Jakob
|
Christian Bizer
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
The DBpedia project extracts structured information from Wikipedia editions in 97 different languages and combines this information into a large multi-lingual knowledge base covering many specific domains and general world knowledge. The knowledge base contains textual descriptions (titles and abstracts) of concepts in up to 97 languages. It also contains structured knowledge that has been extracted from the infobox systems of Wikipedias in 15 different languages and is mapped onto a single consistent ontology by a community effort. The knowledge base can be queried using the SPARQL query language and all its data sets are freely available for download. In this paper, we describe the general DBpedia knowledge base and as well as the DBpedia data sets that specifically aim at supporting computational linguistics tasks. These task include Entity Linking, Word Sense Disambiguation, Question Answering, Slot Filling and Relationship Extraction. These use cases are outlined, pointing at added value that the structured data of DBpedia provides.
2010
Mapping between Dependency Structures and Compositional Semantic Representations
Max Jakob
|
Markéta Lopatková
|
Valia Kordoni
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
This paper investigates the mapping between two semantic formalisms, namely the tectogrammatical layer of the Prague Dependency Treebank 2.0 (PDT) and (Robust) Minimal Recursion Semantics ((R)MRS). It is a first attempt to relate the dependency-based annotation scheme of PDT to a compositional semantics approach like (R)MRS. A mapping algorithm that converts PDT trees to (R)MRS structures is developed, associating (R)MRSs to each node on the dependency tree. Furthermore, composition rules are formulated and the relation between dependency in PDT and semantic heads in (R)MRS is analyzed. It turns out that structure and dependencies, morphological categories and some coreferences can be preserved in the target structures. Moreover, valency and free modifications are distinguished using the valency dictionary of PDT as an additional resource. The validation results show that systematically correct underspecified target representations can be obtained by a rule-based mapping approach, which is an indicator that (R)MRS is indeed robust in relation to the formal representation of Czech data. This finding is novel, for Czech, with its free word order and rich morphology, is typologically different than languages analyzed with (R)MRS to date.