Martin Brümmer


2016

pdf bib
DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus
Martin Brümmer | Milan Dojchinovski | Sebastian Hellmann
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The ever increasing importance of machine learning in Natural Language Processing is accompanied by an equally increasing need in large-scale training and evaluation corpora. Due to its size, its openness and relative quality, the Wikipedia has already been a source of such data, but on a limited scale. This paper introduces the DBpedia Abstract Corpus, a large-scale, open corpus of annotated Wikipedia texts in six languages, featuring over 11 million texts and over 97 million entity links. The properties of the Wikipedia texts are being described, as well as the corpus creation process, its format and interesting use-cases, like Named Entity Linking training and evaluation.

2014

pdf bib
NIF4OGGD - NLP Interchange Format for Open German Governmental Data
Mohamed Sherif | Sandro Coelho | Ricardo Usbeck | Sebastian Hellmann | Jens Lehmann | Martin Brümmer | Andreas Both
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In the last couple of years the amount of structured open government data has increased significantly. Already now, citizens are able to leverage the advantages of open data through increased transparency and better opportunities to take part in governmental decision making processes. Our approach increases the interoperability of existing but distributed open governmental datasets by converting them to the RDF-based NLP Interchange Format (NIF). Furthermore, we integrate the converted data into a geodata store and present a user interface for querying this data via a keyword-based search. The language resource generated in this project is publicly available for download and also via a dedicated SPARQL endpoint.

2013

pdf bib
Lemon-aid: using Lemon to aid quantitative historical linguistic analysis
Steven Moran | Martin Brümmer
Proceedings of the 2nd Workshop on Linked Data in Linguistics (LDL-2013): Representing and linking lexicons, terminologies and other language data