Stephanie Gross
2026
Ragability Benchmark: A Dataset and Library to Test LLMs on Inter-context Conflicts
Stephanie Gross | Johann Petrak | Brigitte Krenn
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Stephanie Gross | Johann Petrak | Brigitte Krenn
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Knowledge conflicts are a challenging issue when applying retrieval augmented generation (RAG) systems. In this paper, we propose a benchmark to test LLMs on how they deal with inter-context knowledge conflicts where implicit reasoning is required to solve the conflict. Based on actual empirical examples, real entities are replaced by fantasy entities to make sure the model’s internal knowledge does not influence how the model deals with external conflicting information. The proposed benchmark can be used to assess current up-to-date LLMs, but it can also flexibly be adapted for in-depth evaluation of a specific RAG system on selected aspects of conflict identification. We also present an experiment where we apply the benchmark to test 7 current LLMs from different model families. The results show that LLMs are able to identify conflicting contexts (’Is there a contradiction, yes or no?’), while they struggle with answering content related queries. Adding a hint that there might be a contradiction in the provided contexts increases the performance of conflict identification for contradictory context, while it significantly decreases the performance for non-contradictory contexts.
2024
GermEval2024 Shared Task: GerMS-Detect – Sexism Detection in German Online News Fora
Stephanie Gross | Johann Petrak | Louisa Venhoff | Brigitte Krenn
Proceedings of GermEval 2024 Task 1 GerMS-Detect Workshop on Sexism Detection in German Online News Fora (GerMS-Detect 2024)
Stephanie Gross | Johann Petrak | Louisa Venhoff | Brigitte Krenn
Proceedings of GermEval 2024 Task 1 GerMS-Detect Workshop on Sexism Detection in German Online News Fora (GerMS-Detect 2024)
Analysing Effects of Inducing Gender Bias in Language Models
Stephanie Gross | Brigitte Krenn | Craig Lincoln | Lena Holzwarth
Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024)
Stephanie Gross | Brigitte Krenn | Craig Lincoln | Lena Holzwarth
Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024)
Proceedings of GermEval 2024 Task 1 GerMS-Detect Workshop on Sexism Detection in German Online News Fora (GerMS-Detect 2024)
Brigitte Krenn | Johann Petrak | Stephanie Gross
Proceedings of GermEval 2024 Task 1 GerMS-Detect Workshop on Sexism Detection in German Online News Fora (GerMS-Detect 2024)
Brigitte Krenn | Johann Petrak | Stephanie Gross
Proceedings of GermEval 2024 Task 1 GerMS-Detect Workshop on Sexism Detection in German Online News Fora (GerMS-Detect 2024)
2020
Linguistic, Kinematic and Gaze Information in Task Descriptions: The LKG-Corpus
Tim Reinboth | Stephanie Gross | Laura Bishop | Brigitte Krenn
Proceedings of the Twelfth Language Resources and Evaluation Conference
Tim Reinboth | Stephanie Gross | Laura Bishop | Brigitte Krenn
Proceedings of the Twelfth Language Resources and Evaluation Conference
Data from neuroscience and psychology suggest that sensorimotor cognition may be of central importance to language. Specifically, the linguistic structure of utterances referring to concrete actions may reflect the structure of the sensorimotor processing underlying the same action. To investigate this, we present the Linguistic, Kinematic and Gaze information in task descriptions Corpus (LKG-Corpus), comprising multimodal data on 13 humans, conducting take, put, and push actions, and describing these actions with 350 utterances. Recorded are audio, video, motion and eye-tracking data while participants perform an action and describe what they do. The dataset is annotated with orthographic transcriptions of utterances and information on: (a) gaze behaviours, (b) when a participant touched an object, (c) when an object was moved, (d) when a participant looked at the location s/he would next move the object to, (e) when the participant’s gaze was stable on an area. With the exception of the annotation of stable gaze, all annotations were performed manually. With the LKG-Corpus, we present a dataset that integrates linguistic, kinematic and gaze data with an explicit focus on relations between action and language. On this basis, we outline applications of the dataset to both basic and applied research.