Jefkine Kafunah
2026
uogal at BEA 2026 Shared Task 1: Ensemble of Multilingual Encoders with NMT Augmentation for L1-Aware Vocabulary Difficulty Prediction
Bernardo Stearns | John P. McCrae | Thomas Gaillat | Jefkine Kafunah
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Bernardo Stearns | John P. McCrae | Thomas Gaillat | Jefkine Kafunah
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
We submit a system for the closed track of the BEA 2026 shared task on L1-aware vocabulary difficulty prediction (Spanish, German, Mandarin Chinese). We compared three families of approaches: hand-crafted tabular features with tree-based regressors, fine-tuned multilingual encoders, and decoder-based artificial learner simulation using LoRA-tuned Pythia models, each evaluated with and without NMT-augmented English context. Our best system is an ensemble of four base and four NMT-augmented multilingual encoders combined through per-language stacking (Nelder-Mead and ElasticNet meta-learner), which placed 2nd in the closed track across all three languages. We also report a monotonic scaling study of the decoder-based artificial learner simulation pipeline.