Aleksandra Gabryszak

2021

pdf bib
MobIE: A German Dataset for Named Entity Recognition, Entity Linking and Relation Extraction in the Mobility Domain
Leonhard Hennig | Phuc Tran Truong | Aleksandra Gabryszak
Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021)

2020

pdf bib abs
Probing Linguistic Features of Sentence-Level Representations in Neural Relation Extraction
Christoph Alt | Aleksandra Gabryszak | Leonhard Hennig
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Despite the recent progress, little is known about the features captured by state-of-the-art neural relation extraction (RE) models. Common methods encode the source sentence, conditioned on the entity mentions, before classifying the relation. However, the complexity of the task makes it difficult to understand how encoder architecture and supporting linguistic knowledge affect the features learned by the encoder. We introduce 14 probing tasks targeting linguistic properties relevant to RE, and we use them to study representations learned by more than 40 different encoder architecture and linguistic feature combinations trained on two datasets, TACRED and SemEval 2010 Task 8. We find that the bias induced by the architecture and the inclusion of linguistic features are clearly expressed in the probing task performance. For example, adding contextualized word representations greatly increases performance on probing tasks with a focus on named entity and part-of-speech information, and yields better results in RE. In contrast, entity masking improves RE, but considerably lowers performance on entity type related probing tasks.

pdf bib abs
TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task
Christoph Alt | Aleksandra Gabryszak | Leonhard Hennig
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

TACRED is one of the largest, most widely used crowdsourced datasets in Relation Extraction (RE). But, even with recent advances in unsupervised pre-training and knowledge enhanced neural RE, models still show a high error rate. In this paper, we investigate the questions: Have we reached a performance ceiling or is there still room for improvement? And how do crowd annotations, dataset, and models contribute to this error rate? To answer these questions, we first validate the most challenging 5K examples in the development and test sets using trained annotators. We find that label errors account for 8% absolute F1 test error, and that more than 50% of the examples need to be relabeled. On the relabeled test set the average F1 score of a large baseline model set improves from 62.1 to 70.1. After validation, we analyze misclassifications on the challenging instances, categorize them into linguistically motivated error groups, and verify the resulting error hypotheses on three state-of-the-art RE models. We show that two groups of ambiguous relations are responsible for most of the remaining errors and that models may adopt shallow heuristics on the dataset when entities are not masked.

2018

pdf bib
A German Corpus for Fine-Grained Named Entity Recognition and Relation Extraction of Traffic and Industry Events
Martin Schiersch | Veselina Mironova | Maximilian Schmitt | Philippe Thomas | Aleksandra Gabryszak | Leonhard Hennig
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
A Corpus Study and Annotation Schema for Named Entity Recognition and Relation Extraction of Business Products
Saskia Schön | Veselina Mironova | Aleksandra Gabryszak | Leonhard Hennig
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

Web debates play an important role in enabling broad participation of constituencies in social, political and economic decision-taking. However, it is challenging to organize, structure, and navigate a vast number of diverse argumentations and comments collected from many participants over a long time period. In this paper we demonstrate Common Round, a next generation platform for large-scale web debates, which provides functions for eliciting the semantic content and structures from the contributions of participants. In particular, Common Round applies language technologies for the extraction of semantic essence from textual input, aggregation of the formulated opinions and arguments. The platform also provides a cross-lingual access to debates using machine translation.

2016

pdf bib abs
Relation- and Phrase-level Linking of FrameNet with Sar-graphs
Aleksandra Gabryszak | Sebastian Krause | Leonhard Hennig | Feiyu Xu | Hans Uszkoreit
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Recent research shows the importance of linking linguistic knowledge resources for the creation of large-scale linguistic data. We describe our approach for combining two English resources, FrameNet and sar-graphs, and illustrate the benefits of the linked data in a relation extraction setting. While FrameNet consists of schematic representations of situations, linked to lexemes and their valency patterns, sar-graphs are knowledge resources that connect semantic relations from factual knowledge graphs to the linguistic phrases used to express instances of these relations. We analyze the conceptual similarities and differences of both resources and propose to link sar-graphs and FrameNet on the levels of relations/frames as well as phrases. The former alignment involves a manual ontology mapping step, which allows us to extend sar-graphs with new phrase patterns from FrameNet. The phrase-level linking, on the other hand, is fully automatic. We investigate the quality of the automatically constructed links and identify two main classes of errors.