Yafei Liu


2021

pdf
Imperfect also Deserves Reward: Multi-Level and Sequential Reward Modeling for Better Dialog Management
Zhengxu Hou | Bang Liu | Ruihui Zhao | Zijing Ou | Yafei Liu | Xi Chen | Yefeng Zheng
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

For task-oriented dialog systems, training a Reinforcement Learning (RL) based Dialog Management module suffers from low sample efficiency and slow convergence speed due to the sparse rewards in RL. To solve this problem, many strategies have been proposed to give proper rewards when training RL, but their rewards lack interpretability and cannot accurately estimate the distribution of state-action pairs in real dialogs. In this paper, we propose a multi-level reward modeling approach that factorizes a reward into a three-level hierarchy: domain, act, and slot. Based on inverse adversarial reinforcement learning, our designed reward model can provide more accurate and explainable reward signals for state-action pairs. Extensive evaluations show that our approach can be applied to a wide range of reinforcement learning-based dialog systems and significantly improves both the performance and the speed of convergence.

2020

pdf
Speaker or Listener? The Role of a Dialog Agent
Yafei Liu | Hongjin Qian | Hengpeng Xu | Jinmao Wei
Findings of the Association for Computational Linguistics: EMNLP 2020

For decades, chitchat bots are designed as a listener to passively answer what people ask. This passive and relatively simple dialogue mechanism gains less attention from humans and consumes the interests of human beings rapidly. Therefore some recent researches attempt to endow the bots with proactivity through external knowledge to transform the role from a listener to a speaker with a hypothesis that the speaker expresses more just like a knowledge disseminator. However, along with the proactive manner introduced into a dialogue agent, an issue arises that, with too many knowledge facts to express, the agent starts to talks endlessly, and even completely ignores what the other expresses in dialogue sometimes, which greatly harms the interest of the other chatter to continue the conversation. To the end, we propose a novel model named Initiative-Imitate to interact with adaptive initiative throughout a dialogue. It forces the agent to express in parallel with the appropriate role during the whole conversation. The corresponding experiments show the proposed Initiative-Imitate obtains competitive results both on the automatic and manual metrics. And the fluency and engagement of the chatbot have also been improved significantly. Besides, the case study indicates the Initiative-Imitate can constantly transfer to appropriate role timely and response more properly during the whole continuous conversation.