Ulf A. Hamster


pdf bib
Everybody likes short sentences - A Data Analysis for the Text Complexity DE Challenge 2022
Ulf A. Hamster
Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text

The German Text Complexity Assessment Shared Task in KONVENS 2022 explores how to predict a complexity score for sentence examples from language learners’ perspective. Our modeling approach for this shared task utilizes off-the-shelf NLP tools for feature engineering and a Random Forest regression model. We identified the text length, or resp. the logarithm of a sentence’s string length, as the most important feature to predict the complexity score. Further analysis showed that the Pearson correlation between text length and complexity score is about 𝜌 ≈ 0.777. A sensitivity analysis on the loss function revealed that semantic SBert features impact the complexity score as well.