Ye Shen


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs
Ammar Khairi | Daniel D’souza | Ye Shen | Julia Kreutzer | Sara Hooker
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Recent advancements in large language models (LLMs) have shifted focus toward scaling inference-time compute—improving performance without retraining the model. A common approach is to sample multiple outputs in parallel, and select one of these as the final output. While existing work has focused on English and specific domains, we study how to robustly scale inference-time compute in a multilingual, multi-task setting: spanning open-ended generations, math and translation tasks, for open models at 8B and 111B scale, across seven languages. Our findings highlight the need for tailored sampling and selection strategies. We propose novel solutions tailored for this multi-faceted inference scenario, demonstrating notable gains across languages and tasks. Our methods achieve an average +6.8 jump in win-rates for 8B models on m-ArenaHard-v2.0 prompts in non-English languages against proprietary models like Gemini. At larger scale, our 111B model shows a +9.0 improvement with just five samples compared to single-sample decoding. These results emphasize the importance of language- and task-aware approaches to democratize inference-time improvements.