Qi Yu

For the analysis of political discourse a reliable identification of group references, i.e., linguistic components that refer to individuals or groups of people, is useful. However, the task of automatically recognizing group references has not yet gained much attention within NLP. To address this gap, we introduce GRIT (Group Reference for Italian), a large-scale, multi-domain manually annotated dataset for group reference recognition in Italian. GRIT represents a new resource for automatic and generalizable recognition of group references. With this dataset, we aim to establish group reference recognition as a valid classification task, which extends the domain of Named Entity Recognition by expanding its focus to literal and figurative mentions of social groups. We verify the potential of achieving automated group reference recognition for Italian through an experiment employing a fine-tuned BERT model. Our experimental results substantiate the validity of the task, implying a huge potential for applying automated systems to multiple fields of analysis, such as political text or social media analysis.

In this paper we focus on a subclass of multi-word expressions, namely compound formation in German. The automatic detection of compounds is a known problem and we argue that its resolution should be given more urgency in light of a new role we uncovered with respect to ad hoc compound formation: the systematic expression of attitudinal meaning and its potential importance for the down-stream NLP task of stance detection. We demonstrate that ad hoc compounds in German indeed systematically express attitudinal meaning by adducing corpus linguistic and psycholinguistic experimental data. However, an investigation of state-of-the-art dependency parsers and Universal Dependency treebanks shows that German compounds are parsed and annotated very unevenly, so that currently one cannot reliably identify or access ad hoc compounds with attitudinal meaning in texts. Moreover, we report initial experiments with large language models underlining the challenges in capturing attitudinal meanings conveyed by ad hoc compounds. We consequently suggest a systematized way of annotating (and thereby also parsing) ad hoc compounds that is based on positive experiences from within the multilingual ParGram grammar development effort.

2023

Recent years have witnessed a growing interest in investigating what Transformer-based language models (TLMs) actually learn from the training data. This is especially relevant for complex tasks such as the understanding of non-literal meaning. In this work, we probe the performance of three black-box TLMs and two intrinsically transparent white-box models on figurative language classification of sarcasm, similes, idioms, and metaphors. We conduct two studies on the classification results to provide insights into the inner workings of such models. With our first analysis on feature importance, we identify crucial differences in model behavior. With our second analysis using an online experiment with human participants, we inspect different linguistic characteristics of the four figurative language types.

In social sciences, recent years have witnessed a growing interest in applying NLP approaches to automatically detect framing in political discourse. However, most NLP studies by now focus heavily on framing effect arising from topic coverage, whereas framing effect arising from subtle usage of linguistic devices remains understudied. In a collaboration with political science researchers, we intend to investigate framing strategies in German newspaper articles on the “European Refugee Crisis”. With the goal of a more in-depth framing analysis, we not only incorporate lexical cues for shallow topic-related framing, but also propose and operationalize a variety of framing-relevant semantic and pragmatic devices, which are theoretically derived from linguistics and political science research. We demonstrate the influential role of these linguistic devices with a large-scale quantitative analysis, bringing novel insights into the linguistic properties of framing.

2022

Earlier NLP studies on framing in political discourse have focused heavily on shallow classification of issue framing, while framing effect arising from pragmatic cues remains neglected. We put forward this latter type of framing as “pragmatic framing”. To bridge this gap, we take presupposition-triggering adverbs such as ‘again’ as a study case, and quantitatively investigate how different German newspapers use them to covertly evoke different attitudinal subtexts in their report on the event “European Refugee Crisis” (2014-2018). Our study demonstrates the crucial role of presuppositions in framing, and emphasizes the necessity of more attention on pragmatic framing in the research of automated framing detection.

2021

This paper describes the submission of the team KonTra to the CMCL 2021 Shared Task on eye-tracking prediction. Our system combines the embeddings extracted from a fine-tuned BERT model with surface, linguistic and behavioral features, resulting in an average mean absolute error of 4.22 across all 5 eye-tracking measures. We show that word length and features representing the expectedness of a word are consistently the strongest predictors across all 5 eye-tracking measures.

Fixing paper assignments

2024

2023

2022

2021

2016

2014

Co-authors

Venues