Anubhab Parashar

2026

Token Titans at BEA 2026 Shared Task 1: Multilingual Lexical Complexity Prediction via Fine-Tuned XLM-RoBERTa with Ensemble Decoding
Anubhab Parashar | Sandeep Mathias
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)

We describe our submission to the BEA 2026 Shared Task on Multilingual Lexical Complexity Prediction. The system fine-tunes XLM-RoBERTa Large separately for Spanish, German, and Chinese, feeding each instance as a flat concatenation of the source word, its sentential context, an English clue, and the English target word. Training uses z-score label normalization and two independent runs thatdiffer in learning rate, scheduler, and random seed; a weighted ensemble of their predictions (0.6/0.4) consistently reduces variance on the validation set. On the official test set the system scores RMSE = 1.170 and Pearson = 0.812.

Co-authors

Sandeep Mathias 1

Venues

BEA1
WS1

Fix author