A Memory-Sensitive Classification Model of Errors in Early Second Language Learning

Brendan Tomoschuk, Jarrett Lovelett


Abstract
In this paper, we explore a variety of linguistic and cognitive features to better understand second language acquisition in early users of the language learning app Duolingo. With these features, we trained a random forest classifier to predict errors in early learners of French, Spanish, and English. Of particular note was our finding that mean and variance in error for each user and token can be a memory efficient replacement for their respective dummy-encoded categorical variables. At test, these models improved over the baseline model with AUROC values of 0.803 for English, 0.823 for French, and 0.829 for Spanish.
Anthology ID:
W18-0527
Volume:
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
231–239
Language:
URL:
https://aclanthology.org/W18-0527
DOI:
10.18653/v1/W18-0527
Bibkey:
Cite (ACL):
Brendan Tomoschuk and Jarrett Lovelett. 2018. A Memory-Sensitive Classification Model of Errors in Early Second Language Learning. In Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 231–239, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
A Memory-Sensitive Classification Model of Errors in Early Second Language Learning (Tomoschuk & Lovelett, BEA 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/W18-0527.pdf