Hiroaki Saito

Traditionally, approximate dynamic programming is employed in dialogue generation with greedy policy improvement through action sampling, as the natural language action space is vast. However, this practice is inefficient for reinforcement learning (RL) due to the sparsity of eligible responses with high action values, which leads to weak improvement sustained by random sampling. This paper presents theoretical analysis and experiments that reveal the performance of the dialogue policy is positively correlated with the sampling size. To overcome this limitation, we introduce a novel dual-granularity Q-function that explores the most promising response category to intervene in the sampling process. Our approach extracts actions based on a grained hierarchy, thereby achieving the optimum with fewer policy iterations. Additionally, we use offline RL and learn from multiple reward functions designed to capture emotional nuances in human interactions. Empirical studies demonstrate that our algorithm outperforms baselines across automatic metrics and human evaluations. Further testing reveals that our algorithm exhibits both explainability and controllability, as well as generates responses with higher expected rewards.

2022

Current works in the generation of personalized dialogue primarily contribute to the agent presenting a consistent personality and driving a more informative response. However, we found that the generated responses from most previous models tend to be self-centered, with little care for the user in the dialogue. Moreover, we consider that human-like conversation is essentially built based on inferring information about the persona of the other party. Motivated by this, we propose a novel personalized dialogue generator by detecting an implicit user persona. Because it is hard to collect a large number of detailed personas for each user, we attempted to model the user’s potential persona and its representation from dialogue history, with no external knowledge. The perception and fader variables were conceived using conditional variational inference. The two latent variables simulate the process of people being aware of each other’s persona and producing a corresponding expression in conversation. Finally, posterior-discriminated regularization was presented to enhance the training procedure. Empirical studies demonstrate that, compared to state-of-the-art methods, our approach is more concerned with the user’s persona and achieves a considerable boost across both automatic metrics and human evaluations.

2010

2008

This paper describes an ongoing project Japanese FrameNet (JFN), a corpus-based lexicon of Japanese in the FrameNet style. This paper focuses on the set of software tools tailored for the JFN annotation process. As the first step in the annotation, annotators select target sentences from the JFN corpus using the JFN kwic search tool, where they can specify cooccurring words and/or the part of speech of collocates. Our search tool is capable of displaying the parsed tree of a target sentence and its neigbouring sentences. The JFN corpus mainly consists of balanced and copyright-free Japanese Corpus which is being built as a national project. After the sentence to be annotated is chosen, the annotator labels syntactic and semantic tags to the appropriate phrases in the sentence. This work is performed on an annotation platform called JFNDesktop, in which the functions of labeling assist and consistency checking of annotations are available. Preliminary evaluation of our platform shows such functions accelerate the annotation process.

2004

2003

2002

2000

1994

1992

1990

1989

This paper describes a speech parsing method called HMM-LR. In HMM-LR, an LR parsing table is used to predict phones in speech input, and the system drives an HMM-based speech recognizer directly without any intervening structures such as a phone lattice. Very accurate, efficient speech parsing is achieved through the integrated processes of speech recognition and language analysis. The HMM-LR m ethod is applied to large-vocabulary speaker-dependent Japanese phrase recognition. The recognition rate is 87.1% for the top candidates and 97.7% for the five best candidates.