Abstract
In this paper, we describe our submission to the GermEval 2022 Shared Task on Text Complexity Assessment of German Text. It addresses the problem of predicting the complexity of German sentences on a continuous scale. While many related works still rely on handcrafted statistical features, neural networks have emerged as state-of-the-art in other natural language processing tasks. Therefore, we investigate how both can complement each other and which features are most relevant for text complexity prediction in German. We propose a fine-tuned German DistilBERT model enriched with statistical text features that achieved fourth place in the shared task with a RMSE of 0.481 on the competition’s test data.- Anthology ID:
- 2022.germeval-1.4
- Volume:
- Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text
- Month:
- September
- Year:
- 2022
- Address:
- Potsdam, Germany
- Venue:
- GermEval
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 21–26
- Language:
- URL:
- https://aclanthology.org/2022.germeval-1.4
- DOI:
- Cite (ACL):
- Miriam Anschütz and Georg Groh. 2022. TUM Social Computing at GermEval 2022: Towards the Significance of Text Statistics and Neural Embeddings in Text Complexity Prediction. In Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text, pages 21–26, Potsdam, Germany. Association for Computational Linguistics.
- Cite (Informal):
- TUM Social Computing at GermEval 2022: Towards the Significance of Text Statistics and Neural Embeddings in Text Complexity Prediction (Anschütz & Groh, GermEval 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.germeval-1.4.pdf