Token Titans at BEA 2026 Shared Task 1: Multilingual Lexical Complexity Prediction via Fine-Tuned XLM-RoBERTa with Ensemble Decoding

Anubhab Parashar, Sandeep Mathias


Abstract
We describe our submission to the BEA 2026 Shared Task on Multilingual Lexical Complexity Prediction. The system fine-tunes XLM-RoBERTa Large separately for Spanish, German, and Chinese, feeding each instance as a flat concatenation of the source word, its sentential context, an English clue, and the English target word. Training uses z-score label normalization and two independent runs thatdiffer in learning rate, scheduler, and random seed; a weighted ensemble of their predictions (0.6/0.4) consistently reduces variance on the validation set. On the official test set the system scores RMSE = 1.170 and Pearson = 0.812.
Anthology ID:
2026.bea-1.80
Volume:
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Ekaterina Kochmar, Bashar Alhafni, Stefano Bannò, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anais Tack, Victoria Yaneva, Zheng Yuan
Venues:
BEA | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1119–1123
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.80/
DOI:
Bibkey:
Cite (ACL):
Anubhab Parashar and Sandeep Mathias. 2026. Token Titans at BEA 2026 Shared Task 1: Multilingual Lexical Complexity Prediction via Fine-Tuned XLM-RoBERTa with Ensemble Decoding. In Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026), pages 1119–1123, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Token Titans at BEA 2026 Shared Task 1: Multilingual Lexical Complexity Prediction via Fine-Tuned XLM-RoBERTa with Ensemble Decoding (Parashar & Mathias, BEA 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.80.pdf