Harksoo Kim


Pipeline Coreference Resolution Model for Anaphoric Identity in Dialogues
Damrin Kim | Seongsik Park | Mirae Han | Harksoo Kim
Proceedings of the CODI-CRAC 2022 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

CODI-CRAC 2022 Shared Task in Dialogues consists of three sub-tasks: Sub-task 1 is the resolution of anaphoric identity, sub-task 2 is the resolution of bridging references, and sub-task 3 is the resolution of discourse deixis/abstract anaphora. Anaphora resolution is the task of detecting mentions from input documents and clustering the mentions of the same entity. The end-to-end model proceeds with the pruning of the candidate mention, and the pruning has the possibility of removing the correct mention. Also, the end-to-end anaphora resolution model has high model complexity, which takes a long time to train. Therefore, we proceed with the anaphora resolution as a two-stage pipeline model. In the first mention detection step, the score of the candidate word span is calculated, and the mention is predicted without pruning. In the second anaphora resolution step, the pair of mentions of the anaphora resolution relationship is predicted using the mentions predicted in the mention detection step. We propose a two-stage anaphora resolution pipeline model that reduces model complexity and training time, and maintains similar performance to end-to-end models. As a result of the experiment, the anaphora resolution showed a performance of 68.27% in Light, 48.87% in AMI, 69.06% in Persuasion, and 60.99% on Switchboard. Our final system ranked 3rd on the leaderboard of sub-task 1.


The Pipeline Model for Resolution of Anaphoric Reference and Resolution of Entity Reference
Hongjin Kim | Damrin Kim | Harksoo Kim
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

The objective of anaphora resolution in dialogue shared-task is to go above and beyond the simple cases of coreference resolution in written text on which NLP has mostly focused so far, which arguably overestimate the performance of current SOTA models. The anaphora resolution in dialogue shared-task consists of three subtasks; subtask1, resolution of anaphoric identity and non-referring expression identification, subtask2, resolution of bridging references, and subtask3, resolution of discourse deixis/abstract anaphora. In this paper, we propose the pipelined model (i.e., a resolution of anaphoric identity and a resolution of bridging references) for the subtask1 and the subtask2. In the subtask1, our model detects mention via the parentheses prediction. Then, we yield mention representation using the token representation constituting the mention. Mention representation is fed to the coreference resolution model for clustering. In the subtask2, our model resolves bridging references via the MRC framework. We construct query for each entity as “What is related of ENTITY?”. The input of our model is query and documents(i.e., all utterances of dialogue). Then, our model predicts entity span that is answer for query.

Deep Context- and Relation-Aware Learning for Aspect-based Sentiment Analysis
Shinhyeok Oh | Dongyub Lee | Taesun Whang | IlNam Park | Seo Gaeun | EungGyun Kim | Harksoo Kim
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Existing works for aspect-based sentiment analysis (ABSA) have adopted a unified approach, which allows the interactive relations among subtasks. However, we observe that these methods tend to predict polarities based on the literal meaning of aspect and opinion terms and mainly consider relations implicitly among subtasks at the word level. In addition, identifying multiple aspect–opinion pairs with their polarities is much more challenging. Therefore, a comprehensive understanding of contextual information w.r.t. the aspect and opinion are further required in ABSA. In this paper, we propose Deep Contextualized Relation-Aware Network (DCRAN), which allows interactive relations among subtasks with deep contextual information based on two modules (i.e., Aspect and Opinion Propagation and Explicit Self-Supervised Strategies). Especially, we design novel self-supervised strategies for ABSA, which have strengths in dealing with multiple aspects. Experimental results show that DCRAN significantly outperforms previous state-of-the-art methods by large margins on three widely used benchmarks.

Document-Grounded Goal-Oriented Dialogue Systems on Pre-Trained Language Model with Diverse Input Representation
Boeun Kim | Dohaeng Lee | Sihyung Kim | Yejin Lee | Jin-Xia Huang | Oh-Woog Kwon | Harksoo Kim
Proceedings of the 1st Workshop on Document-grounded Dialogue and Conversational Question Answering (DialDoc 2021)

Document-grounded goal-oriented dialog system understands users’ utterances, and generates proper responses by using information obtained from documents. The Dialdoc21 shared task consists of two subtasks; subtask1, finding text spans associated with users’ utterances from documents, and subtask2, generating responses based on information obtained from subtask1. In this paper, we propose two models (i.e., a knowledge span prediction model and a response generation model) for the subtask1 and the subtask2. In the subtask1, dialogue act losses are used with RoBERTa, and title embeddings are added to input representation of RoBERTa. In the subtask2, various special tokens and embeddings are added to input representation of BART’s encoder. Then, we propose a method to assign different difficulty scores to leverage curriculum learning. In the subtask1, our span prediction model achieved F1-scores of 74.81 (ranked at top 7) and 73.41 (ranked at top 5) in test-dev phase and test phase, respectively. In the subtask2, our response generation model achieved sacreBLEUs of 37.50 (ranked at top 3) and 41.06 (ranked at top 1) in in test-dev phase and test phase, respectively.


ThisIsCompetition at SemEval-2019 Task 9: BERT is unstable for out-of-domain samples
Cheoneum Park | Juae Kim | Hyeon-gu Lee | Reinald Kim Amplayo | Harksoo Kim | Jungyun Seo | Changki Lee
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes our system, Joint Encoders for Stable Suggestion Inference (JESSI), for the SemEval 2019 Task 9: Suggestion Mining from Online Reviews and Forums. JESSI is a combination of two sentence encoders: (a) one using multiple pre-trained word embeddings learned from log-bilinear regression (GloVe) and translation (CoVe) models, and (b) one on top of word encodings from a pre-trained deep bidirectional transformer (BERT). We include a domain adversarial training module when training for out-of-domain samples. Our experiments show that while BERT performs exceptionally well for in-domain samples, several runs of the model show that it is unstable for out-of-domain samples. The problem is mitigated tremendously by (1) combining BERT with a non-BERT encoder, and (2) using an RNN-based classifier on top of BERT. Our final models obtained second place with 77.78% F-Score on Subtask A (i.e. in-domain) and achieved an F-Score of 79.59% on Subtask B (i.e. out-of-domain), even without using any additional external data.

Relation Extraction among Multiple Entities Using a Dual Pointer Network with a Multi-Head Attention Mechanism
Seong Sik Park | Harksoo Kim
Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)

Many previous studies on relation extrac-tion have been focused on finding only one relation between two entities in a single sentence. However, we can easily find the fact that multiple entities exist in a single sentence and the entities form multiple relations. To resolve this prob-lem, we propose a relation extraction model based on a dual pointer network with a multi-head attention mechanism. The proposed model finds n-to-1 subject-object relations by using a forward de-coder called an object decoder. Then, it finds 1-to-n subject-object relations by using a backward decoder called a sub-ject decoder. In the experiments with the ACE-05 dataset and the NYT dataset, the proposed model achieved the state-of-the-art performances (F1-score of 80.5% in the ACE-05 dataset, F1-score of 78.3% in the NYT dataset)


Two-Step Training and Mixed Encoding-Decoding for Implementing a Generative Chatbot with a Small Dialogue Corpus
Jintae Kim | Hyeon-Gu Lee | Harksoo Kim | Yeonsoo Lee | Young-Gil Kim
Proceedings of the Workshop on Intelligent Interactive Systems and Language Generation (2IS&NLG)


KSAnswer: Question-answering System of Kangwon National University and Sogang University in the 2016 BioASQ Challenge
Hyeon-gu Lee | Minkyoung Kim | Harksoo Kim | Juae Kim | Sunjae Kwon | Jungyun Seo | Yi-reun Kim | Jung-Kyu Choi
Proceedings of the Fourth BioASQ workshop


Speakers’ Intention Prediction Using Statistics of Multi-level Features in a Schedule Management Domain
Donghyun Kim | Hyunjung Lee | Choong-Nyoung Seon | Harksoo Kim | Jungyun Seo
Proceedings of ACL-08: HLT, Short Papers

Information extraction using finite state automata and syllable n-grams in a mobile environment
Choong-Nyoung Seon | Harksoo Kim | Jungyun Seo
Proceedings of the ACL-08: HLT Workshop on Mobile Language Processing


A Reliable Indexing Method for a Practical QA System
Harksoo Kim | Jungyun Seo
COLING-02: Multilingual Summarization and Question Answering


pdf bib
MAYA: A Fast Question-answering System Based on a Predictive Answer Indexer
Harksoo Kim | Kyungsun Kim | Gary Geunbae Lee | Jungyun Seo
Proceedings of the ACL 2001 Workshop on Open-Domain Question Answering


Anaphora Resolution using Extended Centen’ng Algorithm in a Multi-modal Dialogue System
Harksoo Kim | Jeong-Mi Cho | Jungyun Seo
The Relation of Discourse/Dialogue Structure and Reference