2025
pdf
bib
abs
MIDAS: Multi-level Intent, Domain, And Slot Knowledge Distillation for Multi-turn NLU
Yan Li
|
So-Eon Kim
|
Seong-Bae Park
|
Caren Han
Findings of the Association for Computational Linguistics: NAACL 2025
Although Large Language Models (LLMs) can generate coherent text, they often struggle to recognise user intent behind queries. In contrast, Natural Language Understanding (NLU) models interpret the purpose and key information of user input for responsive interactions. Existing NLU models typically map utterances to a dual-level semantic frame, involving sentence-level intent (SI) and word-level slot (WS) labels. However, real-life conversations primarily consist of multi-turn dialogues, requiring the interpretation of complex and extended exchanges. Researchers encounter challenges in addressing all facets of multi-turn dialogue using a unified NLU model. This paper introduces MIDAS, a novel approach leveraging multi-level intent, domain, and slot knowledge distillation for multi-turn NLU. We construct distinct teachers for SI detection, WS filling, and conversation-level domain (CD) classification, each fine-tuned for specific knowledge. A multi-teacher loss is proposed to facilitate the integration of these teachers, guiding a student model in multi-turn dialogue tasks. Results demonstrate the efficacy of our model in improving multi-turn conversation understanding, showcasing the potential for advancements in NLU through multi-level dialogue knowledge distillation. Our implementation is open-sourced on GitHub (https://github.com/adlnlp/Midas).
2022
pdf
bib
abs
Post-Training with Interrogative Sentences for Enhancing BART-based Korean Question Generator
Gyu-Min Park
|
Seong-Eun Hong
|
Seong-Bae Park
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
The pre-trained language models such as KoBART often fail in generating perfect interrogative sentences when they are applied to Korean question generation. This is mainly due to the fact that the language models are much experienced with declarative sentences, but not with interrogative sentences. Therefore, this paper proposes a novel post-training of KoBART to enhance it for Korean question generation. The enhancement of KoBART is accomplished in three ways: (i) introduction of question infilling objective to KoBART to enforce it to focus more on the structure of interrogative sentences, (ii) augmentation of training data for question generation with another data set to cope with the lack of training instances for post-training, (iii) introduction of Korean spacing objective to make KoBART understand the linguistic features of Korean. Since there is no standard data set for Korean question generation, this paper also proposes KorQuAD-QG, a new data set for this task, to verify the performance of the proposed post-training. Our code are publicly available at
https://github.com/gminipark/post_training_qg2019
pdf
bib
abs
Korean Morphological Analysis with Tied Sequence-to-Sequence Multi-Task Model
Hyun-Je Song
|
Seong-Bae Park
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Korean morphological analysis has been considered as a sequence of morpheme processing and POS tagging. Thus, a pipeline model of the tasks has been adopted widely by previous studies. However, the model has a problem that it cannot utilize interactions among the tasks. This paper formulates Korean morphological analysis as a combination of the tasks and presents a tied sequence-to-sequence multi-task model for training the two tasks simultaneously without any explicit regularization. The experiments prove the proposed model achieves the state-of-the-art performance.
2017
pdf
bib
Proceedings of the IJCNLP 2017, System Demonstrations
Seong-Bae Park
|
Thepchai Supnithi
Proceedings of the IJCNLP 2017, System Demonstrations
pdf
bib
abs
WiseReporter: A Korean Report Generation System
Yunseok Noh
|
Su Jeong Choi
|
Seong-Bae Park
|
Se-Young Park
Proceedings of the IJCNLP 2017, System Demonstrations
We demonstrate a report generation system called WiseReporter. The WiseReporter generates a text report of a specific topic which is usually given as a keyword by verbalizing knowledge base facts involving the topic. This demonstration does not demonstate only the report itself, but also the processes how the sentences for the report are generated. We are planning to enhance WiseReporter in the future by adding data analysis based on deep learning architecture and text summarization.
2016
pdf
bib
A Translation-Based Knowledge Graph Embedding Preserving Logical Property of Relations
Hee-Geun Yoon
|
Hyun-Je Song
|
Seong-Bae Park
|
Se-Young Park
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
2014
pdf
bib
Device-Dependent Readability for Improved Text Understanding
A-Yeong Kim
|
Hyun-Je Song
|
Seong-Bae Park
|
Sang-Jo Lee
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
2013
pdf
bib
A Just-In-Time Keyword Extraction from Meeting Transcripts
Hyun-Je Song
|
Junho Go
|
Seong-Bae Park
|
Se-Young Park
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
2012
pdf
bib
A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors
Hyun-Je Song
|
Jeong-Woo Son
|
Tae-Gil Noh
|
Seong-Bae Park
|
Sang-Jo Lee
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
2006
pdf
bib
Self-Organizing n-gram Model for Automatic Word Spacing
Seong-Bae Park
|
Yoon-Shik Tae
|
Se-Young Park
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
2005
pdf
bib
Augmentation of Modality Translation Rules in Korean-to-English Machine Translation by Rule Learning
Seong-Bae Park
|
Jeong-Woo Son
|
Yoon-Shik Tae
Proceedings of Machine Translation Summit X: Papers
2003
pdf
bib
Text Chunking by Combining Hand-Crafted Rules and Memory-Based Learning
Seong-Bae Park
|
Byoung-Tak Zhang
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics
2000
pdf
bib
abs
Machine translation systems: E-K, K-E, J-K, K-J
Yu Seop Kim
|
Sung Dong Kim
|
Seong Bae Park
|
Jong Woo Lee
|
Jeong Ho Chang
|
Kyu Baek Hwang
|
Min O Jang
|
Yung Taek Kim
Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas: User Studies
We present four kinds of machine translation system in this description: E-K (English to Korean), K-E (Korean to English), J-K (Japanese to Korean), K-J (Korean to Japanese). Among these, E-K and K-J translation systems are published commercially, and the other systems have finished their development. This paper describes the structure and function of each system with figures and translation results.
pdf
bib
Word Sense Disambiguation by Learning from Unlabeled Data
Seong-Bae Park
|
Byoung-Tak Zhang
|
Yung Taek Kim
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics