Jefkine Kafunah


2026

We submit a system for the closed track of the BEA 2026 shared task on L1-aware vocabulary difficulty prediction (Spanish, German, Mandarin Chinese). We compared three families of approaches: hand-crafted tabular features with tree-based regressors, fine-tuned multilingual encoders, and decoder-based artificial learner simulation using LoRA-tuned Pythia models, each evaluated with and without NMT-augmented English context. Our best system is an ensemble of four base and four NMT-augmented multilingual encoders combined through per-language stacking (Nelder-Mead and ElasticNet meta-learner), which placed 2nd in the closed track across all three languages. We also report a monotonic scaling study of the decoder-based artificial learner simulation pipeline.