Tracing L1 Interference in English Learner Writing: A Longitudinal Corpus with Error Annotations
Poorvi Acharya, J. Elizabeth Liebl, Dhiman Goswami, Kai North, Marcos Zampieri, Antonios Anastasopoulos
Abstract
The availability of suitable learner corpora is crucial for studying second language acquisition (SLA) and language transfer. However, curating such corpora is challenging, as high-quality learner data is rarely publicly available. As a result, only a few learner corpora, such as ICLE and TOEFL-11, are accessible to the research community.To address this gap, we present Anonymous, a novel English learner corpus with longitudinal data. The corpus consists of 687 texts written by adult learners taking English as a second language courses in the USA. These learners are either preparing for university admission or enhancing their language proficiency while beginning their university studies. Unlike most learner corpora, Anonymous includes longitudinal data, allowing researchers to explore language learning trajectories over time. The corpus features contributions from speakers of 15 different L1s.We demonstrate the utility of Anonymous through two case studies at the intersection of SLA and Computational Linguistics: (1) Native Language Identification (NLI), and (2) a quantitative and qualitative analysis of linguistic features influenced by L1 using large language models- Anthology ID:
- 2025.emnlp-main.766
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 15157–15178
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.766/
- DOI:
- Cite (ACL):
- Poorvi Acharya, J. Elizabeth Liebl, Dhiman Goswami, Kai North, Marcos Zampieri, and Antonios Anastasopoulos. 2025. Tracing L1 Interference in English Learner Writing: A Longitudinal Corpus with Error Annotations. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 15157–15178, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- Tracing L1 Interference in English Learner Writing: A Longitudinal Corpus with Error Annotations (Acharya et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.766.pdf