Tracing L1 Interference in English Learner Writing: A Longitudinal Corpus with Error Annotations
Poorvi Acharya, J. Elizabeth Liebl, Dhiman Goswami, Kai North, Marcos Zampieri, Antonios Anastasopoulos
Abstract
Language transfer is an important topic of research in second language acquisition and computational linguistics. The availability of suitable learner corpora is paramount for the study of second language acquisition (SLA) and language transfer. However, curating learner corpora is a challenging endeavor as high quality learner data is rarely publicly available. This results in only a few such corpora available to the community. To address this important gap, in this paper we present LENS, a novel English learner corpus with longitudinal data which enables researchers to investigate language learning over time. LENS contains 687 instances written by speakers of 15 different L1s. We use LENS two perform two important tasks at the intersection of SLA and Computational Linguistics: (1) Native Language Identification (NLI); and (2) an evaluation of large language models as a tool for high-precision, semi-automated annotation of L1 interference features.- Anthology ID:
- 2025.emnlp-main.766
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 15157–15178
- Language:
- URL:
- https://preview.aclanthology.org/author-page-bin-wu-ucl/2025.emnlp-main.766/
- DOI:
- 10.18653/v1/2025.emnlp-main.766
- Cite (ACL):
- Poorvi Acharya, J. Elizabeth Liebl, Dhiman Goswami, Kai North, Marcos Zampieri, and Antonios Anastasopoulos. 2025. Tracing L1 Interference in English Learner Writing: A Longitudinal Corpus with Error Annotations. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 15157–15178, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- Tracing L1 Interference in English Learner Writing: A Longitudinal Corpus with Error Annotations (Acharya et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/author-page-bin-wu-ucl/2025.emnlp-main.766.pdf