Lily Goulder
2026
L1 Influence in L2 Language Models: A Human-centric Approach
Laura Barbenel | Lily Goulder | Aoife O’Driscoll | Suchir Salhan | Catherine Arnett | Andrew Caines | Paula Buttery
Proceedings of the 1st Workshop on Computational Developmental Linguistics (CDL)
Laura Barbenel | Lily Goulder | Aoife O’Driscoll | Suchir Salhan | Catherine Arnett | Andrew Caines | Paula Buttery
Proceedings of the 1st Workshop on Computational Developmental Linguistics (CDL)
Language learners typically exhibit first language (L1) influence in their written second language (L2) production. We investigate whether similar patterns emerge in L2 language models (L2LMs), which are typically assessed on task-based benchmarks rather than on language use. We evaluate the use of Native Language Identification (NLI) as a method for detecting whether L2LMs exhibit human-like L1 influence. Using existing learner corpora and our novel L2 English dataset, we identify the conditions that yield the highest NLI accuracy, and show that text length but not proficiency affects performance. We then apply NLI to L2LM-generated text under various instruction-tuning and prompting conditions. We find that instruction tuning on human learner essays yields high NLI accuracy (~90%) and is necessary for detectable L1 influence. Whilst NLI accuracy is similar for L2LM and human essays, human evaluation shows that LM-generated L1 influence remains distinguishable from human writing.