Shubh Sehgal

2026

SAAKTH at BEA 2026 Shared Task 1: L1-Aware English Vocabulary Difficulty Prediction with Hybrid Transformer and Psycholinguistic Features
Karthik Mattu | Adit Dhall | Arshad Naguru | Shubh Sehgal | Thejas Gowda | Hakyung Sung
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)

This paper presents team SAAKTH’s system for the BEA 2026 Shared Task on Vocabulary Difficulty Prediction (Closed Track). We address the key challenge that English word difficulty is not fixed but varies with English learners’ native language. Our approach combines a fine-tuned XLM-RoBERTa-large encoder with handcrafted psycholinguistic features engineered separately for each L1 group. These features are integrated via a shallow multilayer perceptron and optimized separately per L1, with five-seed ensembling and XGBoost-based blending for stability. Our system achieves RMSEs of 0.997 (es), 1.002 (de), and 0.932 (cn) on the development set, improving 20–25% over the baseline. Results highlight the effectiveness of L1-aware modeling under limited data.

Co-authors

Venues

BEA1
WS1

Fix author