Matthias Drews


2024

pdf
GIL-GALaD: Gender Inclusive Language - German Auto-Assembled Large Database
Anna-Katharina Dick | Matthias Drews | Valentin Pickard | Victoria Pierz
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

As the need for gender-inclusive language has become a highly debated topic over the years, gendered biases in speech are unfortunately often picked up and propagated by modern language models trained on large amounts of text. While remedial efforts are underway, grammatically gendered languages such as German pose some unique challenges in generating gender-inclusive language for corrective model training or fine-tuning. We assembled GIL-GALaD, a corpus of German gender-inclusive language from different sources such as social media, news articles, public speeches and academic publications. Our corpus includes the most common types of modifications of generic masculine forms of nouns and spans 30 years (1993-2023), containing over 800,000 instances of gender-inclusive language. Tools for corpus usage and extension are to be included in the release. During corpus assembly, we were also able to gain some insights into which types of gender-inclusive language were used in practice throughout the years and across different domains.

2023

pdf
CICL_DMS at SemEval-2023 Task 11: Learning With Disagreements (Le-Wi-Di)
Dennis Grötzinger | Simon Heuschkel | Matthias Drews
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

In this system paper, we describe our submission for the 11th task of SemEval2023: Learning with Disagreements, or Le-Wi-Di for short. In the task, the assumption that there is a single gold label in NLP tasks such as hate speech or misogyny detection is challenged, and instead the opinions of multiple annotators are considered. The goal is instead to capture the agreements/disagreements of the annotators. For our system, we utilize the capabilities of modern large-language models as our backbone and investigate various techniques built on top, such as ensemble learning, multi-task learning, or Gaussian processes. Our final submission shows promising results and we achieve an upper-half finish.