Janghoon Han


2023

pdf
Leveraging Ensemble Techniques and Metadata for Subjective Knowledge-grounded Conversational Systems
Seongho Joo | Kang-il Lee | Kyungmin Min | Joongbo Shin | Janghoon Han | Seungpil Won | Kyomin Jung
Proceedings of The Eleventh Dialog System Technology Challenge

The goal of DSTC11 track 5 is to build task-oriented dialogue systems that can effectively utilize external knowledge sources such as FAQs and reviews. This year’s challenge differs from previous ones as it includes subjective knowledge snippets and requires multiple snippets for a single turn. We propose a pipeline system for the challenge focusing on entity tracking, knowledge selection and response generation. Specifically, we devise a novel heuristic to ensemble the outputs from the rule-based method and neural model for entity tracking and knowledge selection. We also leverage metadata information in the knowledge source to handle fine-grained user queries. Our approach achieved the first place in objective evaluation and the third place in human evaluation of DSTC11 track 5.

pdf
BREAK: Breaking the Dialogue State Tracking Barrier with Beam Search and Re-ranking
Seungpil Won | Heeyoung Kwak | Joongbo Shin | Janghoon Han | Kyomin Jung
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Despite the recent advances in dialogue state tracking (DST), the joint goal accuracy (JGA) of the existing methods on MultiWOZ 2.1 still remains merely 60%. In our preliminary error analysis, we find that beam search produces a pool of candidates that is likely to include the correct dialogue state. Motivated by this observation, we introduce a novel framework, called BREAK (Beam search and RE-rAnKing), that achieves outstanding performance on DST. BREAK performs DST in two stages: (i) generating k-best dialogue state candidates with beam search and (ii) re-ranking the candidates to select the correct dialogue state. This simple yet powerful framework shows state-of-the-art performance on all versions of MultiWOZ and M2M datasets. Most notably, we push the joint goal accuracy to 80-90% on MultiWOZ 2.1-2.4, which is an improvement of 23.6%, 26.3%, 21.7%, and 10.8% over the previous best-performing models, respectively. The data and code will be available at https://github.com/tony-won/DST-BREAK

pdf
Local Temperature Beam Search: Avoid Neural Text DeGeneration via Enhanced Calibration
Dongkyu Lee | Gyeonghun Kim | Janghoon Han | Taesuk Hong | Yi-Reun Kim | Stanley Jungkyu Choi | Nevin L. Zhang
Findings of the Association for Computational Linguistics: ACL 2023

Previous studies have constantly observed that a language model repeats itself, creating repetitions in an output sequence. To cope with the issue, stochastic decoding schemes have been the de facto approaches; the strategies add randomness in inference, hence avoiding the “self-loop”. However, the remedy comes at the cost of sacrificing output quality due to the randomness involved. In this work, we introduce a deterministic decoding scheme, local temperature beam search. This inference algorithm is an embarrassingly simple variant of beam search, yet it reduces repetition, whose level is superior to that of a sampling-based decoding algorithm, while maintaining the level of coherence as in beam search. Our idea is rooted in the concept of model calibration; we view a repetition as a casualty from overconfidence in a model. Therefore, our work mitigates the miscalibration present in the course of inference with a post-calibration approach applied in beam-specific manner. Our inference scheme is validated on text completion tasks, in which the repetition problem is seen most clearly, and is exhaustively compared with existing inference schemes.

2022

pdf
TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models
Joel Jang | Seonghyeon Ye | Changho Lee | Sohee Yang | Joongbo Shin | Janghoon Han | Gyeonghun Kim | Minjoon Seo
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Language Models (LMs) become outdated as the world changes; they often fail to perform tasks requiring recent factual information which was absent or different during training, a phenomenon called temporal misalignment. This is especially a challenging problem because the research community still lacks a coherent dataset for assessing the adaptability of LMs to frequently-updated knowledge corpus such as Wikipedia. To this end, we introduce TemporalWiki, a lifelong benchmark for ever-evolving LMs that utilizes the difference between consecutive snapshots of English Wikipedia and English Wikidata for training and evaluation, respectively. The benchmark hence allows researchers to periodically track an LM’s ability to retain previous knowledge and acquire updated/new knowledge at each point in time. We also find that training an LM on the diff data through continual learning methods achieves similar or better perplexity than on the entire snapshot in our benchmark with 12 times less computational cost, which verifies that factual knowledge in LMs can be safely updated with minimal training data via continual learning.

2021

pdf
Fine-grained Post-training for Improving Retrieval-based Dialogue Systems
Janghoon Han | Taesuk Hong | Byoungjae Kim | Youngjoong Ko | Jungyun Seo
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Retrieval-based dialogue systems display an outstanding performance when pre-trained language models are used, which includes bidirectional encoder representations from transformers (BERT). During the multi-turn response selection, BERT focuses on training the relationship between the context with multiple utterances and the response. However, this method of training is insufficient when considering the relations between each utterance in the context. This leads to a problem of not completely understanding the context flow that is required to select a response. To address this issue, we propose a new fine-grained post-training method that reflects the characteristics of the multi-turn dialogue. Specifically, the model learns the utterance level interactions by training every short context-response pair in a dialogue session. Furthermore, by using a new training objective, the utterance relevance classification, the model understands the semantic relevance and coherence between the dialogue utterances. Experimental results show that our model achieves new state-of-the-art with significant margins on three benchmark datasets. This suggests that the fine-grained post-training method is highly effective for the response selection task.