Aleksandra Gabryszak


MobASA: Corpus for Aspect-based Sentiment Analysis and Social Inclusion in the Mobility Domain
Aleksandra Gabryszak | Philippe Thomas
Proceedings of the First Computing Social Responsibility Workshop within the 13th Language Resources and Evaluation Conference

In this paper we show how aspect-based sentiment analysis might help public transport companies to improve their social responsibility for accessible travel. We present MobASA: a novel German-language corpus of tweets annotated with their relevance for public transportation, and with sentiment towards aspects related to barrier-free travel. We identified and labeled topics important for passengers limited in their mobility due to disability, age, or when travelling with young children. The data can be used to identify hurdles and improve travel planning for vulnerable passengers, as well as to monitor a perception of transportation businesses regarding the social inclusion of all passengers. The data is publicly available under:


MobIE: A German Dataset for Named Entity Recognition, Entity Linking and Relation Extraction in the Mobility Domain
Leonhard Hennig | Phuc Tran Truong | Aleksandra Gabryszak
Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021)


Probing Linguistic Features of Sentence-Level Representations in Neural Relation Extraction
Christoph Alt | Aleksandra Gabryszak | Leonhard Hennig
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Despite the recent progress, little is known about the features captured by state-of-the-art neural relation extraction (RE) models. Common methods encode the source sentence, conditioned on the entity mentions, before classifying the relation. However, the complexity of the task makes it difficult to understand how encoder architecture and supporting linguistic knowledge affect the features learned by the encoder. We introduce 14 probing tasks targeting linguistic properties relevant to RE, and we use them to study representations learned by more than 40 different encoder architecture and linguistic feature combinations trained on two datasets, TACRED and SemEval 2010 Task 8. We find that the bias induced by the architecture and the inclusion of linguistic features are clearly expressed in the probing task performance. For example, adding contextualized word representations greatly increases performance on probing tasks with a focus on named entity and part-of-speech information, and yields better results in RE. In contrast, entity masking improves RE, but considerably lowers performance on entity type related probing tasks.

TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task
Christoph Alt | Aleksandra Gabryszak | Leonhard Hennig
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

TACRED is one of the largest, most widely used crowdsourced datasets in Relation Extraction (RE). But, even with recent advances in unsupervised pre-training and knowledge enhanced neural RE, models still show a high error rate. In this paper, we investigate the questions: Have we reached a performance ceiling or is there still room for improvement? And how do crowd annotations, dataset, and models contribute to this error rate? To answer these questions, we first validate the most challenging 5K examples in the development and test sets using trained annotators. We find that label errors account for 8% absolute F1 test error, and that more than 50% of the examples need to be relabeled. On the relabeled test set the average F1 score of a large baseline model set improves from 62.1 to 70.1. After validation, we analyze misclassifications on the challenging instances, categorize them into linguistically motivated error groups, and verify the resulting error hypotheses on three state-of-the-art RE models. We show that two groups of ambiguous relations are responsible for most of the remaining errors and that models may adopt shallow heuristics on the dataset when entities are not masked.


A German Corpus for Fine-Grained Named Entity Recognition and Relation Extraction of Traffic and Industry Events
Martin Schiersch | Veselina Mironova | Maximilian Schmitt | Philippe Thomas | Aleksandra Gabryszak | Leonhard Hennig
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

A Corpus Study and Annotation Schema for Named Entity Recognition and Relation Extraction of Business Products
Saskia Schön | Veselina Mironova | Aleksandra Gabryszak | Leonhard Hennig
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)


pdf bib
Common Round: Application of Language Technologies to Large-Scale Web Debates
Hans Uszkoreit | Aleksandra Gabryszak | Leonhard Hennig | Jörg Steffen | Renlong Ai | Stephan Busemann | Jon Dehdari | Josef van Genabith | Georg Heigold | Nils Rethmeier | Raphael Rubino | Sven Schmeier | Philippe Thomas | He Wang | Feiyu Xu
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

Web debates play an important role in enabling broad participation of constituencies in social, political and economic decision-taking. However, it is challenging to organize, structure, and navigate a vast number of diverse argumentations and comments collected from many participants over a long time period. In this paper we demonstrate Common Round, a next generation platform for large-scale web debates, which provides functions for eliciting the semantic content and structures from the contributions of participants. In particular, Common Round applies language technologies for the extraction of semantic essence from textual input, aggregation of the formulated opinions and arguments. The platform also provides a cross-lingual access to debates using machine translation.


Relation- and Phrase-level Linking of FrameNet with Sar-graphs
Aleksandra Gabryszak | Sebastian Krause | Leonhard Hennig | Feiyu Xu | Hans Uszkoreit
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Recent research shows the importance of linking linguistic knowledge resources for the creation of large-scale linguistic data. We describe our approach for combining two English resources, FrameNet and sar-graphs, and illustrate the benefits of the linked data in a relation extraction setting. While FrameNet consists of schematic representations of situations, linked to lexemes and their valency patterns, sar-graphs are knowledge resources that connect semantic relations from factual knowledge graphs to the linguistic phrases used to express instances of these relations. We analyze the conceptual similarities and differences of both resources and propose to link sar-graphs and FrameNet on the levels of relations/frames as well as phrases. The former alignment involves a manual ontology mapping step, which allows us to extend sar-graphs with new phrase patterns from FrameNet. The phrase-level linking, on the other hand, is fully automatic. We investigate the quality of the automatically constructed links and identify two main classes of errors.


Sar-graphs: A Linked Linguistic Knowledge Resource Connecting Facts with Language
Sebastian Krause | Leonhard Hennig | Aleksandra Gabryszak | Feiyu Xu | Hans Uszkoreit
Proceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications


An analysis of textual inference in German customer emails
Kathrin Eichler | Aleksandra Gabryszak | Günter Neumann
Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014)