uogal at BEA 2026 Shared Task 1: Ensemble of Multilingual Encoders with NMT Augmentation for L1-Aware Vocabulary Difficulty Prediction
Bernardo Stearns, John P. McCrae, Thomas Gaillat, Jefkine Kafunah
Abstract
We submit a system for the closed track of the BEA 2026 shared task on L1-aware vocabulary difficulty prediction (Spanish, German, Mandarin Chinese). We compared three families of approaches: hand-crafted tabular features with tree-based regressors, fine-tuned multilingual encoders, and decoder-based artificial learner simulation using LoRA-tuned Pythia models, each evaluated with and without NMT-augmented English context. Our best system is an ensemble of four base and four NMT-augmented multilingual encoders combined through per-language stacking (Nelder-Mead and ElasticNet meta-learner), which placed 2nd in the closed track across all three languages. We also report a monotonic scaling study of the decoder-based artificial learner simulation pipeline.- Anthology ID:
- 2026.bea-1.75
- Volume:
- Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, USA
- Editors:
- Ekaterina Kochmar, Bashar Alhafni, Stefano Bannò, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anais Tack, Victoria Yaneva, Zheng Yuan
- Venues:
- BEA | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1065–1076
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.75/
- DOI:
- Cite (ACL):
- Bernardo Stearns, John P. McCrae, Thomas Gaillat, and Jefkine Kafunah. 2026. uogal at BEA 2026 Shared Task 1: Ensemble of Multilingual Encoders with NMT Augmentation for L1-Aware Vocabulary Difficulty Prediction. In Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026), pages 1065–1076, San Diego, California, USA. Association for Computational Linguistics.
- Cite (Informal):
- uogal at BEA 2026 Shared Task 1: Ensemble of Multilingual Encoders with NMT Augmentation for L1-Aware Vocabulary Difficulty Prediction (Stearns et al., BEA 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.75.pdf