Abstract
The performance of Reinforcement Learning (RL) for natural language tasks including Machine Translation (MT) is crucially dependent on the reward formulation. This is due to the intrinsic difficulty of the task in the high-dimensional discrete action space as well as the sparseness of the standard reward functions defined for limited set of ground-truth sequences biased towards singular lexical choices. To address this issue, we formulate SURF, a maximally dense semantic-level unsupervised reward function which mimics human evaluation by considering both sentence fluency and semantic similarity. We demonstrate the strong potential of SURF to leverage a family of Actor-Critic Transformer-based Architectures with synchronous and asynchronous multi-agent variants. To tackle the problem of large action-state spaces, each agent is equipped with unique exploration strategies, promoting diversity during its exploration of the hypothesis space. When BLEU scores are compared, our dense unsupervised reward outperforms the standard sparse reward by 2% on average for in- and out-of-domain settings.- Anthology ID:
- 2022.naacl-main.334
- Volume:
- Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, United States
- Editors:
- Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4508–4522
- Language:
- URL:
- https://aclanthology.org/2022.naacl-main.334
- DOI:
- 10.18653/v1/2022.naacl-main.334
- Cite (ACL):
- Atijit Anuchitanukul and Julia Ive. 2022. SURF: Semantic-level Unsupervised Reward Function for Machine Translation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4508–4522, Seattle, United States. Association for Computational Linguistics.
- Cite (Informal):
- SURF: Semantic-level Unsupervised Reward Function for Machine Translation (Anuchitanukul & Ive, NAACL 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2022.naacl-main.334.pdf
- Data
- OpenSubtitles