Sofia Kathmann
2026
Criterial Features in German: Towards Interpretable NLP in Readability Assessment
Denise Loefflad | Sofia Kathmann | Heiko Holz | Detmar Meurers
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Denise Loefflad | Sofia Kathmann | Heiko Holz | Detmar Meurers
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
This paper presents an empirical evaluation of the German Grammar Profile (GGP), a CEFR-aligned resource of criterial features, and its corresponding extraction system PALME. We design a systematic test suite in which each feature extractor is evaluated on controlled positive and negative examples. The results show that PALME achieves high precision and recall across all CEFR levels, with over 90% of features achieving scores above 0.8. Qualitative analysis shows that lower performance primarily results from morphological ambiguity in noun and adjective case marking. To evaluate the usefulness of the criterial features of the GGP for CEFR-aligned readability assessment, we assess their predictive power using Explainable Boosting Machines on graded readers. The model achieves strong performance (precision: 0.75, recall: 0.73). Our qualitative analysis shows that features related to specific verb constructions follow patterns consistent with developmental stages predicted by Processability Theory. These findings underline the value and relevance of criterial features for modeling language development in readability assessment.
2025
Team KiAmSo at SemEval-2025 Task 11: A Comparison of Classification Models for Multi-label Emotion Detection
Kimberly Sharp | Sofia Kathmann | Amelie Rüeck
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Kimberly Sharp | Sofia Kathmann | Amelie Rüeck
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
The aim of this paper is to take on the challenge of multi-label emotion detection for a variety of languages as part of Track A in SemEval 2025 Task 11: Bridging the Gap in Text-Based Emotion Detection. We fine-tune different pre-trained mono- and multilingual language models and compare their performance on multi-label emotion detection on a variety of high-resource and low-resource languages. Overall, we find that monolingual models tend to perform better, but for low-resource languages that do not have state-of-the-art pre-trained language models, multilingual models can achieve comparable results.