Luis Frentzen Salim

2026

Beyond Many-Shot Translation: Scaling In-Context Demonstrations For Low-Resource Machine Translation
Luis Frentzen Salim | Esteban Carlin | Alexandre Morinvil | Xi Ai | Lun-Wei Ku
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)

Building machine translation (MT) systems for low-resource languages is notably difficult due to the scarcity of high-quality data. Although Large Language Models (LLMs) have improved MT system performance, adapting them to lesser-represented languages remains challenging. In-context learning (ICL) may offer novel ways to adapt LLMs for low-resource MT by conditioning models on demonstration at inference time. In this study, we explore scaling low-resource machine translation ICL beyond the few-shot setting to thousands of examples with long-context models. We scale in-context token budget to 1M tokens and compare three types of training corpora used as in-context supervision: monolingual unsupervised data, instruction-style data, and parallel data (English–target and Indonesian–target). Our experiments on Javanese and Sundanese show that gains from additional context saturate quickly and can degrade near the maximum context window, with scaling behavior strongly dependent on corpus type. Notably, some forms of monolingual supervision can be competitive with parallel data, despite the latter offering additional supervision. Overall, our results characterize the effective limits and corpus-type sensitivity of long-context ICL for low-resource MT, highlighting that larger context windows do not necessarily yield proportional quality gains.

pdf bib abs

Language identification (LID) is a fundamental step in curating multilingual corpora. However, LID models still perform poorly for many languages, especially on the noisy and heterogeneous web data often used to train multilingual language models. In this paper, we introduce CommonLID, a community-driven, human-annotated LID benchmark for the web domain, covering 109 languages. Many of the included languages have been previously under-served, making CommonLID a key resource for developing more representative high-quality text corpora. We show CommonLID’s value by using it, alongside five other common evaluation sets, to test eight popular LID models. We analyse our results to situate our contribution and to provide an overview of the state of the art. In particular, we highlight that existing evaluations overestimate LID accuracy for many languages in the web domain. We make CommonLID and the code used to create it available under an open, permissive license.

pdf bib abs

Expert Calibration Lens for Pruning Mixture of Experts
Luis Frentzen Salim | Chia-Chun Wu | Tran Van Nhiem | Lun-Wei Ku | Yung-Hui Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

Expert pruning is a practical deployment technique for Mixture-of-Experts (MoE) models. It reduces resource usage and mitigates expert redundancy, but its success depends strongly on the calibration set used for pruning. In domain-general settings, it is unclear which properties of the calibration data drive good pruning outcomes, and the effects of calibration perturbations are often unintuitive. We observe, for example, that calibration sets in different languages can lead to very similar pruning results despite appearing dissimilar on the surface.To address this, we propose Expert Calibration Lens, a lightweight analysis tool that compares expert activation patterns across datasets to predict the impact of calibration perturbations without repeatedly running expensive pruning procedures. We use activations that are quick to compute and evaluate the resulting analysis for downstream task performance.

Luis Frentzen Salim

2026

Co-authors

Venues