Faster Machine Translation Ensembling with Reinforcement Learning and Competitive Correction

Kritarth Prasad; Mohammadi Zaki; Pratik Rakesh Singh; Pankaj Wasnik

Faster Machine Translation Ensembling with Reinforcement Learning and Competitive Correction

Kritarth Prasad, Mohammadi Zaki, Pratik Rakesh Singh, Pankaj Wasnik

Abstract

Ensembling neural machine translation (NMT) models to produce higher-quality translations than the L individual models has been extensively studied. Recent methods typically employ a candidate selection block (CSB) and an encoder-decoder fusion block (FB), requiring inference across all candidate models, leading to significant computational overhead, generally 𝛺(L). This paper introduces SmartGen, a reinforcement learning (RL)-based strategy that improves the CSB by selecting a small, fixed number of candidates and identifying optimal groups to pass to the fusion block for each input sentence. Furthermore, previously, the CSB and FB were trained independently, leading to suboptimal NMT performance. Our DQN-based SmartGen addresses this by using feedback from the FB block as a reward during training. We also resolve a key issue in earlier methods, where candidates were passed to the FB without modification, by introducing a Competitive Correction Block (CCB). Finally, we validate our approach with extensive experiments on English-Hindi translation tasks in both directions as well as English to Chinese and English to German.

Anthology ID:: 2025.findings-naacl.466
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8322–8335
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.466/
DOI:
Bibkey:
Cite (ACL):: Kritarth Prasad, Mohammadi Zaki, Pratik Rakesh Singh, and Pankaj Wasnik. 2025. Faster Machine Translation Ensembling with Reinforcement Learning and Competitive Correction. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 8322–8335, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Faster Machine Translation Ensembling with Reinforcement Learning and Competitive Correction (Prasad et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.466.pdf

PDF Cite Search Fix data