Abstract
We present the results of our investigations aiming at identifying the most informative linguistic complexity features for classifying language learning levels in three different datasets. The datasets vary across two dimensions: the size of the instances (texts vs. sentences) and the language learning skill they involve (reading comprehension texts vs. texts written by learners themselves). We present a subset of the most predictive features for each dataset, taking into consideration significant differences in their per-class mean values and show that these subsets lead not only to simpler models, but also to an improved classification performance. Furthermore, we pinpoint fourteen central features that are good predictors regardless of the size of the linguistic unit analyzed or the skills involved, which include both morpho-syntactic and lexical dimensions.- Anthology ID:
- W18-4606
- Volume:
- Proceedings of the Workshop on Linguistic Complexity and Natural Language Processing
- Month:
- August
- Year:
- 2018
- Address:
- Santa Fe, New-Mexico
- Editors:
- Leonor Becerra-Bonache, M. Dolores Jiménez-López, Carlos Martín-Vide, Adrià Torrens-Urrutia
- Venue:
- WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 49–58
- Language:
- URL:
- https://aclanthology.org/W18-4606
- DOI:
- Cite (ACL):
- Ildikó Pilán and Elena Volodina. 2018. Investigating the importance of linguistic complexity features across different datasets related to language learning. In Proceedings of the Workshop on Linguistic Complexity and Natural Language Processing, pages 49–58, Santa Fe, New-Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Investigating the importance of linguistic complexity features across different datasets related to language learning (Pilán & Volodina, 2018)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/W18-4606.pdf