BoostedCats at BEA 2026 Shared Task 1: What Makes a Word Hard to Learn? Modeling L1 Influence on English Vocabulary Difficulty
Jonas Mayer Martins, Zhuojing Huang, Aaricia Herygers, Lisa Beinborn
Abstract
What makes a word difficult to learn, and how does the difficulty depend on the learner’s native language? We computationally model vocabulary difficulty for English learners whose first language is Spanish, German, or Chinese with gradient-boosted models trained on features related to a word’s familiarity (e.g., frequency), meaning, surface form, and cross-linguistic transfer. Using Shapley values, we determine the importance of each feature group. Word familiarity is the dominant feature group shared by all three languages. However, predictions for Spanish- and German-speaking learners rely additionally on orthographic transfer. This transfer mechanism is unavailable to Chinese learners, whose difficulty is shaped by a combination of familiarity and surface features alone. Our models provide interpretable, L1-tailored difficulty estimates that can be used to design vocabulary curricula.- Anthology ID:
- 2026.bea-1.74
- Volume:
- Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, USA
- Editors:
- Ekaterina Kochmar, Bashar Alhafni, Stefano Bannò, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anais Tack, Victoria Yaneva, Zheng Yuan
- Venues:
- BEA | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1047–1064
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.74/
- DOI:
- Cite (ACL):
- Jonas Mayer Martins, Zhuojing Huang, Aaricia Herygers, and Lisa Beinborn. 2026. BoostedCats at BEA 2026 Shared Task 1: What Makes a Word Hard to Learn? Modeling L1 Influence on English Vocabulary Difficulty. In Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026), pages 1047–1064, San Diego, California, USA. Association for Computational Linguistics.
- Cite (Informal):
- BoostedCats at BEA 2026 Shared Task 1: What Makes a Word Hard to Learn? Modeling L1 Influence on English Vocabulary Difficulty (Mayer Martins et al., BEA 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.74.pdf