Cheat Codes to Quantify Missing Source Information in Neural Machine Translation

Proyag Pal, Kenneth Heafield


Abstract
This paper describes a method to quantify the amount of information H(t|s) added by the target sentence t that is not present in the source s in a neural machine translation system. We do this by providing the model the target sentence in a highly compressed form (a “cheat code”), and exploring the effect of the size of the cheat code. We find that the model is able to capture extra information from just a single float representation of the target and nearly reproduces the target with two 32-bit floats per target token.
Anthology ID:
2022.naacl-main.177
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2472–2477
Language:
URL:
https://aclanthology.org/2022.naacl-main.177
DOI:
10.18653/v1/2022.naacl-main.177
Bibkey:
Cite (ACL):
Proyag Pal and Kenneth Heafield. 2022. Cheat Codes to Quantify Missing Source Information in Neural Machine Translation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2472–2477, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Cheat Codes to Quantify Missing Source Information in Neural Machine Translation (Pal & Heafield, NAACL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/2022.naacl-main.177.pdf
Video:
 https://preview.aclanthology.org/dois-2013-emnlp/2022.naacl-main.177.mp4