Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents
Dong Won Lee, Hae Won Park, Yoon Kim, Cynthia Breazeal, Louis-Philippe Morency
Abstract
We describe an approach for aligning an LLM based dialogue agent for long-term social dialogue, where there is only a single global score given by the user at the end of the session. In this paper, we propose the usage of denser naturally-occurring multimodal communicative signals as local implicit feedback to improve the turn-level utterance generation. Therefore, our approach (dubbed GELI) learns a local, turn-level reward model by decomposing the human-provided Global Explicit (GE) session level reward, using Local Implicit (LI) multimodal reward signals to crossmodally shape the reward decomposition step. This decomposed reward model is then used as part of the RLHF pipeline to improve an LLM-based dialog agent. We run quantitative and qualitative human studies on two large-scale datasets to evaluate the performance of our GELI approach, and find that it shows consistent improvements across various conversational metrics compared to baseline methods.- Anthology ID:
- 2024.emnlp-main.881
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 15737–15762
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2024.emnlp-main.881/
- DOI:
- 10.18653/v1/2024.emnlp-main.881
- Cite (ACL):
- Dong Won Lee, Hae Won Park, Yoon Kim, Cynthia Breazeal, and Louis-Philippe Morency. 2024. Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 15737–15762, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents (Lee et al., EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2024.emnlp-main.881.pdf