Pablo Mendes

2017

pdf abs
Recognizing Mentions of Adverse Drug Reaction in Social Media Using Knowledge-Infused Recurrent Models
Gabriel Stanovsky | Daniel Gruhl | Pablo Mendes
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Recognizing mentions of Adverse Drug Reactions (ADR) in social media is challenging: ADR mentions are context-dependent and include long, varied and unconventional descriptions as compared to more formal medical symptom terminology. We use the CADEC corpus to train a recurrent neural network (RNN) transducer, integrated with knowledge graph embeddings of DBpedia, and show the resulting model to be highly accurate (93.4 F1). Furthermore, even when lacking high quality expert annotations, we show that by employing an active learning technique and using purpose built annotation tools, we can train the RNN to perform well (83.9 F1).

2016

pdf abs
Evaluating Entity Linking: An Analysis of Current Benchmark Datasets and a Roadmap for Doing a Better Job
Marieke van Erp | Pablo Mendes | Heiko Paulheim | Filip Ilievski | Julien Plu | Giuseppe Rizzo | Joerg Waitelonis
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Entity linking has become a popular task in both natural language processing and semantic web communities. However, we find that the benchmark datasets for entity linking tasks do not accurately evaluate entity linking systems. In this paper, we aim to chart the strengths and weaknesses of current benchmark datasets and sketch a roadmap for the community to devise better benchmark datasets.

2015

2012

pdf abs
Evaluating the Impact of Phrase Recognition on Concept Tagging
Pablo Mendes | Joachim Daiber | Rohana Rajapakse | Felix Sasaki | Christian Bizer
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We have developed DBpedia Spotlight, a flexible concept tagging system that is able to annotate entities, topics and other terms in natural language text. The system starts by recognizing phrases to annotate in the input text, and subsequently disambiguates them to a reference knowledge base extracted from Wikipedia. In this paper we evaluate the impact of the phrase recognition step on the ability of the system to correctly reproduce the annotations of a gold standard in an unsupervised setting. We argue that a combination of techniques is needed, and we evaluate a number of alternatives according to an existing evaluation set.

pdf abs
DBpedia: A Multilingual Cross-domain Knowledge Base
Pablo Mendes | Max Jakob | Christian Bizer
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The DBpedia project extracts structured information from Wikipedia editions in 97 different languages and combines this information into a large multi-lingual knowledge base covering many specific domains and general world knowledge. The knowledge base contains textual descriptions (titles and abstracts) of concepts in up to 97 languages. It also contains structured knowledge that has been extracted from the infobox systems of Wikipedias in 15 different languages and is mapped onto a single consistent ontology by a community effort. The knowledge base can be queried using the SPARQL query language and all its data sets are freely available for download. In this paper, we describe the general DBpedia knowledge base and as well as the DBpedia data sets that specifically aim at supporting computational linguistics tasks. These task include Entity Linking, Word Sense Disambiguation, Question Answering, Slot Filling and Relationship Extraction. These use cases are outlined, pointing at added value that the structured data of DBpedia provides.