Tianjiao Li
2026
Rethinking Prompt Optimizers: From Prompt Merits to Optimization
Zixiao Zhu | Hanzhang Zhou | Zijian Feng | Tianjiao Li | Chua Jia Jim Deryl | Lee Onn Mak | Gee Wah Ng | Kezhi Mao
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Zixiao Zhu | Hanzhang Zhou | Zijian Feng | Tianjiao Li | Chua Jia Jim Deryl | Lee Onn Mak | Gee Wah Ng | Kezhi Mao
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Prompt optimization (PO) provides a practical way to improve response quality when users lack the time or expertise to manually craft effective prompts. Existing methods typically rely on LLMs’ self-generation ability to optimize prompts. However, due to limited downward compatibility, the instruction-heavy prompts generated by advanced LLMs can overwhelm lightweight inference models and degrade response quality, while also lacking interpretability due to implicit optimization. In this work, we rethink prompt optimization through the lens of explicit and interpretable design. We first identify a set of model-agnostic prompt quality merits and empirically validate their effectiveness in enhancing prompt and response quality. We then introduce MePO, a merit-guided, locally deployable prompt optimizer trained on our merit-guided prompt preference dataset generated by a lightweight LLM. MePO avoids online optimization, reduces privacy concerns, and, by learning clear, interpretable merits, generalizes effectively to both large-scale and lightweight inference models. Experiments demonstrate that MePO achieves better results across diverse tasks and model types, offering a scalable and robust solution for real-world deployment. The code, model and dataset can be found in https://github.com/MidiyaZhu/MePO.
2025
RIVAL: Reinforcement Learning with Iterative and Adversarial Optimization for Machine Translation
Tianjiao Li | Mengran Yu | Chenyu Shi | Yanjun Zhao | Xiaojing Liu | Qi Zhang | Xuanjing Huang | Qiang Zhang | Jiayin Wang
Findings of the Association for Computational Linguistics: EMNLP 2025
Tianjiao Li | Mengran Yu | Chenyu Shi | Yanjun Zhao | Xiaojing Liu | Qi Zhang | Xuanjing Huang | Qiang Zhang | Jiayin Wang
Findings of the Association for Computational Linguistics: EMNLP 2025
Large language models (LLMs) possess strong multilingual capabilities, and combining Reinforcement Learning from Human Feedback (RLHF) with translation tasks has shown great potential. However, we observe that this paradigm performs unexpectedly poorly when applied to colloquial subtitle translation tasks. In this work, we investigate this issue and find that the offline reward model (RM) gradually diverges from the online LLM due to distributional shift, ultimately leading to undesirable training outcomes. To address this, we propose RIVAL, an adversarial training framework that formulates the process as a min–max game between the RM and the LLM. RIVAL iteratively updates the both models, with the RM trained to distinguish strong from weak translations (qualitative preference reward), and the LLM trained to enhance its translation for closing this gap. To stabilize training and improve generalizability, we also incorporate quantitative preference reward (e.g., BLEU) into the RM, enabling reference-free quality modeling aligned with human evaluation. Through extensive experiments, we demonstrate that the proposed training framework significantly improves upon translation baselines.