DC-MBR: Distributional Cooling for Minimum Bayesian Risk Decoding

Jianhao Yan, Jin Xu, Fandong Meng, Jie Zhou, Yue Zhang


Abstract
Minimum Bayesian Risk Decoding (MBR) emerges as a promising decoding algorithm in Neural Machine Translation. However, MBR performs poorly with label smoothing, which is surprising as label smoothing provides decent improvement with beam search and improves generality in various tasks. In this work, we show that the issue arises from the inconsistency of label smoothing on the token-level and sequence-level distributions. We demonstrate that even though label smoothing only causes a slight change in the token level, the sequence-level distribution is highly skewed. We coin the issue autoregressive over-smoothness. To address this issue, we propose a simple and effective method, Distributional Cooling MBR (DC-MBR), which manipulates the entropy of output distributions by tuning down the Softmax temperature. We theoretically prove the equivalence between the pre-tuning label smoothing factor and distributional cooling. Extensive experiments on NMT benchmarks validate that distributional cooling improves MBR in various settings.
Anthology ID:
2024.lrec-main.395
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
4423–4437
Language:
URL:
https://aclanthology.org/2024.lrec-main.395
DOI:
Bibkey:
Cite (ACL):
Jianhao Yan, Jin Xu, Fandong Meng, Jie Zhou, and Yue Zhang. 2024. DC-MBR: Distributional Cooling for Minimum Bayesian Risk Decoding. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 4423–4437, Torino, Italia. ELRA and ICCL.
Cite (Informal):
DC-MBR: Distributional Cooling for Minimum Bayesian Risk Decoding (Yan et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.lrec-main.395.pdf