Most existing approaches for goal-oriented dialogue policy learning used reinforcement learning, which focuses on the target agent policy and simply treats the opposite agent policy as part of the environment. While in real-world scenarios, the behavior of an opposite agent often exhibits certain patterns or underlies hidden policies, which can be inferred and utilized by the target agent to facilitate its own decision making. This strategy is common in human mental simulation by first imaging a specific action and the probable results before really acting it. We therefore propose an opposite behavior aware framework for policy learning in goal-oriented dialogues. We estimate the opposite agent’s policy from its behavior and use this estimation to improve the target agent by regarding it as part of the target policy. We evaluate our model on both cooperative and competitive dialogue tasks, showing superior performance over state-of-the-art baselines.
Recurrent neural networks (RNNs) suffer from well-known limitations and complications which include slow inference and vanishing gradients when processing long sequences in text classification. Recent studies have attempted to accelerate RNNs via various ad hoc mechanisms to skip irrelevant words in the input. However, word skipping approaches proposed to date effectively stop at each or a given time step to decide whether or not a given input word should be skipped, breaking the coherence of input processing in RNNs. Furthermore, current methods cannot change skip rates during inference and are consequently unable to support different skip rates in demanding real-world conditions. To overcome these limitations, we propose Pointer- LSTM, a novel LSTM framework which relies on a pointer network to select important words for target prediction. The model maintains a coherent input process for the LSTM modules and makes it possible to change the skip rate during inference. Our evaluation on four public data sets demonstrates that Pointer-LSTM (a) is 1.1x∼3.5x faster than the standard LSTM architecture; (b) is more accurate than Leap-LSTM (the state-of-the-art LSTM skipping model) at high skip rates; and (c) reaches robust accuracy levels even when the skip rate is changed during inference.