SURF: Semantic-level Unsupervised Reward Function for Machine Translation

Atijit Anuchitanukul, Julia Ive


Abstract
The performance of Reinforcement Learning (RL) for natural language tasks including Machine Translation (MT) is crucially dependent on the reward formulation. This is due to the intrinsic difficulty of the task in the high-dimensional discrete action space as well as the sparseness of the standard reward functions defined for limited set of ground-truth sequences biased towards singular lexical choices. To address this issue, we formulate SURF, a maximally dense semantic-level unsupervised reward function which mimics human evaluation by considering both sentence fluency and semantic similarity. We demonstrate the strong potential of SURF to leverage a family of Actor-Critic Transformer-based Architectures with synchronous and asynchronous multi-agent variants. To tackle the problem of large action-state spaces, each agent is equipped with unique exploration strategies, promoting diversity during its exploration of the hypothesis space. When BLEU scores are compared, our dense unsupervised reward outperforms the standard sparse reward by 2% on average for in- and out-of-domain settings.
Anthology ID:
2022.naacl-main.334
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4508–4522
Language:
URL:
https://aclanthology.org/2022.naacl-main.334
DOI:
10.18653/v1/2022.naacl-main.334
Bibkey:
Cite (ACL):
Atijit Anuchitanukul and Julia Ive. 2022. SURF: Semantic-level Unsupervised Reward Function for Machine Translation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4508–4522, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
SURF: Semantic-level Unsupervised Reward Function for Machine Translation (Anuchitanukul & Ive, NAACL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2022.naacl-main.334.pdf
Video:
 https://preview.aclanthology.org/auto-file-uploads/2022.naacl-main.334.mp4
Data
OpenSubtitles