Aleksandra Gabryszak

2024

pdf abs
Enhancing Editorial Tasks: A Case Study on Rewriting Customer Help Page Contents Using Large Language Models
Aleksandra Gabryszak | Daniel Röder | Arne Binder | Luca Sion | Leonhard Hennig
Proceedings of the 17th International Natural Language Generation Conference

In this paper, we investigate the use of large language models (LLMs) to enhance the editorial process of rewriting customer help pages. We introduce a German-language dataset comprising Frequently Asked Question-Answer pairs, presenting both raw drafts and their revisions by professional editors. On this dataset, we evaluate the performance of four large language models (LLM) through diverse prompts tailored for the rewriting task. We conduct automatic evaluations of content and text quality using ROUGE, BERTScore, and ChatGPT. Furthermore, we let professional editors assess the helpfulness of automatically generated FAQ revisions for editorial enhancement. Our findings indicate that LLMs can produce FAQ reformulations beneficial to the editorial process. We observe minimal performance discrepancies among LLMs for this task, and our survey on helpfulness underscores the subjective nature of editors’ perspectives on editorial refinement.

pdf abs
Large Language Models Are Echo Chambers
Jan Nehring | Aleksandra Gabryszak | Pascal Jürgens | Aljoscha Burchardt | Stefan Schaffer | Matthias Spielkamp | Birgit Stark
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Modern large language models and chatbots based on them show impressive results in text generation and dialog tasks. At the same time, these models are subject to criticism in many aspects, e.g., they can generate hate speech and untrue and biased content. In this work, we show another problematic feature of such chatbots: they are echo chambers in the sense that they tend to agree with the opinions of their users. Social media, such as Facebook, was criticized for a similar problem and called an echo chamber. We experimentally test five LLM-based chatbots, which we feed with opinionated inputs. We annotate the chatbot answers whether they agree or disagree with the input. All chatbots tend to agree. However, the echo chamber effect is not equally strong. We discuss the differences between the chatbots and make the dataset publicly available.

2023

pdf
Factuality Detection using Machine Translation – a Use Case for German Clinical Text
Mohammed Bin Sumait | Aleksandra Gabryszak | Leonhard Hennig | Roland A. Roller
Proceedings of the 19th Conference on Natural Language Processing (KONVENS 2023)

2022

pdf abs
MobASA: Corpus for Aspect-based Sentiment Analysis and Social Inclusion in the Mobility Domain
Aleksandra Gabryszak | Philippe Thomas
Proceedings of the First Computing Social Responsibility Workshop within the 13th Language Resources and Evaluation Conference

In this paper we show how aspect-based sentiment analysis might help public transport companies to improve their social responsibility for accessible travel. We present MobASA: a novel German-language corpus of tweets annotated with their relevance for public transportation, and with sentiment towards aspects related to barrier-free travel. We identified and labeled topics important for passengers limited in their mobility due to disability, age, or when travelling with young children. The data can be used to identify hurdles and improve travel planning for vulnerable passengers, as well as to monitor a perception of transportation businesses regarding the social inclusion of all passengers. The data is publicly available under: https://github.com/DFKI-NLP/sim3s-corpus

2021

pdf
MobIE: A German Dataset for Named Entity Recognition, Entity Linking and Relation Extraction in the Mobility Domain
Leonhard Hennig | Phuc Tran Truong | Aleksandra Gabryszak
Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021)

2020

pdf abs
Probing Linguistic Features of Sentence-Level Representations in Neural Relation Extraction
Christoph Alt | Aleksandra Gabryszak | Leonhard Hennig
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Despite the recent progress, little is known about the features captured by state-of-the-art neural relation extraction (RE) models. Common methods encode the source sentence, conditioned on the entity mentions, before classifying the relation. However, the complexity of the task makes it difficult to understand how encoder architecture and supporting linguistic knowledge affect the features learned by the encoder. We introduce 14 probing tasks targeting linguistic properties relevant to RE, and we use them to study representations learned by more than 40 different encoder architecture and linguistic feature combinations trained on two datasets, TACRED and SemEval 2010 Task 8. We find that the bias induced by the architecture and the inclusion of linguistic features are clearly expressed in the probing task performance. For example, adding contextualized word representations greatly increases performance on probing tasks with a focus on named entity and part-of-speech information, and yields better results in RE. In contrast, entity masking improves RE, but considerably lowers performance on entity type related probing tasks.

pdf abs
TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task
Christoph Alt | Aleksandra Gabryszak | Leonhard Hennig
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

TACRED is one of the largest, most widely used crowdsourced datasets in Relation Extraction (RE). But, even with recent advances in unsupervised pre-training and knowledge enhanced neural RE, models still show a high error rate. In this paper, we investigate the questions: Have we reached a performance ceiling or is there still room for improvement? And how do crowd annotations, dataset, and models contribute to this error rate? To answer these questions, we first validate the most challenging 5K examples in the development and test sets using trained annotators. We find that label errors account for 8% absolute F1 test error, and that more than 50% of the examples need to be relabeled. On the relabeled test set the average F1 score of a large baseline model set improves from 62.1 to 70.1. After validation, we analyze misclassifications on the challenging instances, categorize them into linguistically motivated error groups, and verify the resulting error hypotheses on three state-of-the-art RE models. We show that two groups of ambiguous relations are responsible for most of the remaining errors and that models may adopt shallow heuristics on the dataset when entities are not masked.

2018

pdf
A German Corpus for Fine-Grained Named Entity Recognition and Relation Extraction of Traffic and Industry Events
Martin Schiersch | Veselina Mironova | Maximilian Schmitt | Philippe Thomas | Aleksandra Gabryszak | Leonhard Hennig
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
A Corpus Study and Annotation Schema for Named Entity Recognition and Relation Extraction of Business Products
Saskia Schön | Veselina Mironova | Aleksandra Gabryszak | Leonhard Hennig
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

Web debates play an important role in enabling broad participation of constituencies in social, political and economic decision-taking. However, it is challenging to organize, structure, and navigate a vast number of diverse argumentations and comments collected from many participants over a long time period. In this paper we demonstrate Common Round, a next generation platform for large-scale web debates, which provides functions for eliciting the semantic content and structures from the contributions of participants. In particular, Common Round applies language technologies for the extraction of semantic essence from textual input, aggregation of the formulated opinions and arguments. The platform also provides a cross-lingual access to debates using machine translation.

2016

pdf abs
Relation- and Phrase-level Linking of FrameNet with Sar-graphs
Aleksandra Gabryszak | Sebastian Krause | Leonhard Hennig | Feiyu Xu | Hans Uszkoreit
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Recent research shows the importance of linking linguistic knowledge resources for the creation of large-scale linguistic data. We describe our approach for combining two English resources, FrameNet and sar-graphs, and illustrate the benefits of the linked data in a relation extraction setting. While FrameNet consists of schematic representations of situations, linked to lexemes and their valency patterns, sar-graphs are knowledge resources that connect semantic relations from factual knowledge graphs to the linguistic phrases used to express instances of these relations. We analyze the conceptual similarities and differences of both resources and propose to link sar-graphs and FrameNet on the levels of relations/frames as well as phrases. The former alignment involves a manual ontology mapping step, which allows us to extend sar-graphs with new phrase patterns from FrameNet. The phrase-level linking, on the other hand, is fully automatic. We investigate the quality of the automatically constructed links and identify two main classes of errors.