Gary Geunbae Lee

Also published as: Gary Geunbae Lee, Geunbae Lee


2024

pdf
Denoising Table-Text Retrieval for Open-Domain Question Answering
Deokhyung Kang | Baikjin Jung | Yunsu Kim | Gary Geunbae Lee
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In table-text open-domain question answering, a retriever system retrieves relevant evidence from tables and text to answer questions. Previous studies in table-text open-domain question answering have two common challenges: firstly, their retrievers can be affected by false-positive labels in training datasets; secondly, they may struggle to provide appropriate evidence for questions that require reasoning across the table. To address these issues, we propose Denoised Table-Text Retriever (DoTTeR). Our approach involves utilizing a denoised training dataset with fewer false positive labels by discarding instances with lower question-relevance scores measured through a false positive detection model. Subsequently, we integrate table-level ranking information into the retriever to assist in finding evidence for questions that demand reasoning across the table. To encode this ranking information, we fine-tune a rank-aware column encoder to identify minimum and maximum values within a column. Experimental results demonstrate that DoTTeR significantly outperforms strong baselines on both retrieval recall and downstream QA tasks. Our code is available at https://github.com/deokhk/DoTTeR.

pdf
Explainable Multi-hop Question Generation: An End-to-End Approach without Intermediate Question Labeling
Seonjeong Hwang | Yunsu Kim | Gary Geunbae Lee
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In response to the increasing use of interactive artificial intelligence, the demand for the capacity to handle complex questions has increased. Multi-hop question generation aims to generate complex questions that requires multi-step reasoning over several documents. Previous studies have predominantly utilized end-to-end models, wherein questions are decoded based on the representation of context documents. However, these approaches lack the ability to explain the reasoning process behind the generated multi-hop questions. Additionally, the question rewriting approach, which incrementally increases the question complexity, also has limitations due to the requirement of labeling data for intermediate-stage questions. In this paper, we introduce an end-to-end question rewriting model that increases question complexity through sequential rewriting. The proposed model has the advantage of training with only the final multi-hop questions, without intermediate questions. Experimental results demonstrate the effectiveness of our model in generating complex questions, particularly 3- and 4-hop questions, which are appropriately paired with input answers. We also prove that our model logically and incrementally increases the complexity of questions, and the generated multi-hop questions are also beneficial for training question answering models.

pdf
Leveraging the Interplay between Syntactic and Acoustic Cues for Optimizing Korean TTS Pause Formation
Yejin Jeon | Yunsu Kim | Gary Geunbae Lee
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Contemporary neural speech synthesis models have indeed demonstrated remarkable proficiency in synthetic speech generation as they have attained a level of quality comparable to that of human-produced speech. Nevertheless, it is important to note that these achievements have predominantly been verified within the context of high-resource languages such as English. Furthermore, the Tacotron and FastSpeech variants show substantial pausing errors when applied to the Korean language, which affects speech perception and naturalness. In order to address the aforementioned issues, we propose a novel framework that incorporates comprehensive modeling of both syntactic and acoustic cues that are associated with pausing patterns. Remarkably, our framework possesses the capability to consistently generate natural speech even for considerably more extended and intricate out-of-domain (OOD) sentences, despite its training on short audio clips. Architectural design choices are validated through comparisons with baseline models and ablation studies using subjective and objective metrics, thus confirming model performance.

2023

pdf
Prompt- and Trait Relation-aware Cross-prompt Essay Trait Scoring
Heejin Do | Yunsu Kim | Gary Geunbae Lee
Findings of the Association for Computational Linguistics: ACL 2023

Automated essay scoring (AES) aims to score essays written for a given prompt, which defines the writing topic. Most existing AES systems assume to grade essays of the same prompt as used in training and assign only a holistic score. However, such settings conflict with real-education situations; pre-graded essays for a particular prompt are lacking, and detailed trait scores of sub-rubrics are required. Thus, predicting various trait scores of unseen-prompt essays (called cross-prompt essay trait scoring) is a remaining challenge of AES. In this paper, we propose a robust model: prompt- and trait relation-aware cross-prompt essay trait scorer. We encode prompt-aware essay representation by essay-prompt attention and utilizing the topic-coherence feature extracted by the topic-modeling mechanism without access to labeled data; therefore, our model considers the prompt adherence of an essay, even in a cross-prompt setting. To facilitate multi-trait scoring, we design trait-similarity loss that encapsulates the correlations of traits. Experiments prove the efficacy of our model, showing state-of-the-art results for all prompts and traits. Significant improvements in low-resource-prompt and inferior traits further indicate our model’s strength.

pdf
DORIC : Domain Robust Fine-Tuning for Open Intent Clustering through Dependency Parsing
Jihyun Lee | Seungyeon Seo | Yunsu Kim | Gary Geunbae Lee
Proceedings of The Eleventh Dialog System Technology Challenge

We present our work on Track 2 in the Dialog System Technology Challenges 11 (DSTC11). DSTC11-Track2 aims to provide a benchmark for zero-shot, cross-domain, intent-set induction. In the absence of in-domain training dataset, robust utterance representation that can be used across domains is necessary to induce users’ intentions. To achieve this, we leveraged a multi-domain dialogue dataset to fine-tune the language model and proposed extracting Verb-Object pairs to remove the artifacts of unnecessary information. Furthermore, we devised the method that generates each cluster’s name for the explainability of clustered results. Our approach achieved 3rd place in the precision score and showed superior accuracy and normalized mutual information (NMI) score than the baseline model on various domain datasets.

pdf
Exploring Back Translation with Typo Noise for Enhanced Inquiry Understanding in Task-Oriented Dialogue
Jihyun Lee | Junseok Kim | Gary Geunbae Lee
Proceedings of The Eleventh Dialog System Technology Challenge

This paper presents our approach to the DSTC11 Track 5 selection task, which focuses on retrieving appropriate natural language knowledge sources for task-oriented dialogue. We propose typologically diverse back-translation method with typo noise, which could generate various structured user inquries. Through our noised back translation, we augmented inquiries by combining three different typologies of language sources with five different typo noise injections. Our experiments demonstrate that typological variety and typo noise aids the model in generalizing to diverse user inquiries in dialogue. In the competition, where 14 teams participated, our approach achieved the 5th rank for exact matching metric.

2022

pdf
Multi-Type Conversational Question-Answer Generation with Closed-ended and Unanswerable Questions
Seonjeong Hwang | Yunsu Kim | Gary Geunbae Lee
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Conversational question answering (CQA) facilitates an incremental and interactive understanding of a given context, but building a CQA system is difficult for many domains due to the problem of data scarcity. In this paper, we introduce a novel method to synthesize data for CQA with various question types, including open-ended, closed-ended, and unanswerable questions. We design a different generation flow for each question type and effectively combine them in a single, shared framework. Moreover, we devise a hierarchical answerability classification (hierarchical AC) module that improves quality of the synthetic data while acquiring unanswerable questions. Manual inspections show that synthetic data generated with our framework have characteristics very similar to those of human-generated conversations. Across four domains, CQA systems trained on our synthetic data indeed show good performance close to the systems trained on human-annotated data.

pdf
Schema Encoding for Transferable Dialogue State Tracking
Hyunmin Jeon | Gary Geunbae Lee
Proceedings of the 29th International Conference on Computational Linguistics

Dialogue state tracking (DST) is an essential sub-task for task-oriented dialogue systems. Recent work has focused on deep neural models for DST. However, the neural models require a large dataset for training. Furthermore, applying them to another domain needs a new dataset because the neural models are generally trained to imitate the given dataset. In this paper, we propose Schema Encoding for Transferable Dialogue State Tracking (SET-DST), which is a neural DST method for effective transfer to new domains. Transferable DST could assist developments of dialogue systems even with few dataset on target domains. We use a schema encoder not just to imitate the dataset but to comprehend the schema of the dataset. We aim to transfer the model to new domains by encoding new schemas and using them for DST on multi-domain settings. As a result, SET-DST improved the joint accuracy by 1.46 points on MultiWOZ 2.1.

pdf
Conversational QA Dataset Generation with Answer Revision
Seonjeong Hwang | Gary Geunbae Lee
Proceedings of the 29th International Conference on Computational Linguistics

Conversational question-answer generation is a task that automatically generates a large-scale conversational question answering dataset based on input passages. In this paper, we introduce a novel framework that extracts question-worthy phrases from a passage and then generates corresponding questions considering previous conversations. In particular, our framework revises the extracted answers after generating questions so that answers exactly match paired questions. Experimental results show that our simple answer revision approach leads to significant improvement in the quality of synthetic data. Moreover, we prove that our framework can be effectively utilized for domain adaptation of conversational question answering.

2018

pdf
Out-of-domain Detection based on Generative Adversarial Network
Seonghan Ryu | Sangjun Koo | Hwanjo Yu | Gary Geunbae Lee
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

The main goal of this paper is to develop out-of-domain (OOD) detection for dialog systems. We propose to use only in-domain (IND) sentences to build a generative adversarial network (GAN) of which the discriminator generates low scores for OOD sentences. To improve basic GANs, we apply feature matching loss in the discriminator, use domain-category analysis as an additional task in the discriminator, and remove the biases in the generator. Thereby, we reduce the huge effort of collecting OOD sentences for training OOD detection. For evaluation, we experimented OOD detection on a multi-domain dialog system. The experimental results showed the proposed method was most accurate compared to the existing methods.

2015

pdf
Exploiting knowledge base to generate responses for natural language dialog listening agents
Sangdo Han | Jeesoo Bang | Seonghan Ryu | Gary Geunbae Lee
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf
Conversational Knowledge Teaching Agent that uses a Knowledge Base
Kyusong Lee | Paul Hongsuck Seo | Junhwi Choi | Sangjun Koo | Gary Geunbae Lee
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf
Question Answering System using Multiple Information Source and Open Type Answer Merge
Seonyeong Park | Soonchoul Kwon | Byungsoo Kim | Sangdo Han | Hyosup Shim | Gary Geunbae Lee
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

2014

pdf
POSTECH Grammatical Error Correction System in the CoNLL-2014 Shared Task
Kyusong Lee | Gary Geunbae Lee
Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task

2013

pdf
Counseling Dialog System with 5W1H Extraction
Sangdo Han | Kyusong Lee | Donghyeon Lee | Gary Geunbae Lee
Proceedings of the SIGDIAL 2013 Conference

2012

pdf
A Hierarchical Domain Model-Based Multi-Domain Selection Framework for Multi-Domain Dialog Systems
Seonghan Ryu | Donghyeon Lee | Injae Lee | Sangdo Han | Gary Geunbae Lee | Myungjae Kim | Kyungduk Kim
Proceedings of COLING 2012: Posters

pdf bib
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Gary Geunbae Lee | Jonathan Ginzburg | Claire Gardent | Amanda Stent
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf
Grammatical Error Annotation for Korean Learners of Spoken English
Hongsuck Seo | Kyusong Lee | Gary Geunbae Lee | Soo-Ok Kweon | Hae-Ri Kim
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The goal of our research is to build a grammatical error-tagged corpus for Korean learners of Spoken English dubbed Postech Learner Corpus. We collected raw story-telling speech from Korean university students. Transcription and annotation using the Cambridge Learner Corpus tagset were performed by six Korean annotators fluent in English. For the annotation of the corpus, we developed an annotation tool and a validation tool. After comparing human annotation with machine-recommended error tags, unmatched errors were rechecked by a native annotator. We observed different characteristics between the spoken language corpus built in this study and an existing written language corpus.

pdf bib
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Haizhou Li | Chin-Yew Lin | Miles Osborne | Gary Geunbae Lee | Jong C. Park
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Haizhou Li | Chin-Yew Lin | Miles Osborne | Gary Geunbae Lee | Jong C. Park
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
Seokhwan Kim | Gary Geunbae Lee
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf
A Meta Learning Approach to Grammatical Error Correction
Hongsuck Seo | Jonghoon Lee | Seokhwan Kim | Kyusong Lee | Sechun Kang | Gary Geunbae Lee
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2011

pdf
A Cross-lingual Annotation Projection-based Self-supervision Approach for Open Information Extraction
Seokhwan Kim | Minwoo Jeong | Jonghoon Lee | Gary Geunbae Lee
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf
POMY: A Conversational Virtual Environment for Language Learning in POSTECH
Hyungjong Noh | Kyusong Lee | Sungjin Lee | Gary Geunbae Lee
Proceedings of the SIGDIAL 2011 Conference

2010

pdf
A Cross-lingual Annotation Projection Approach for Relation Detection
Seokhwan Kim | Minwoo Jeong | Jonghoon Lee | Gary Geunbae Lee
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2009

pdf
Automatic Agenda Graph Construction from Human-Human Dialogs using Clustering Method
Cheongjae Lee | Sangkeun Jung | Kyungduk Kim | Gary Geunbae Lee
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf
A Local Tree Alignment-based Soft Pattern Matching Approach for Information Extraction
Seokhwan Kim | Minwoo Jeong | Gary Geunbae Lee
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf
Hybrid Approach to User Intention Modeling for Dialog Simulation
Sangkeun Jung | Cheongjae Lee | Kyungduk Kim | Gary Geunbae Lee
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf
Realistic Grammar Error Simulation using Markov Logic
Sungjin Lee | Gary Geunbae Lee
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf
Efficient Inference of CRFs for Large-Scale Natural Language Data
Minwoo Jeong | Chin-Yew Lin | Gary Geunbae Lee
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf bib
Proceedings of the ACL-IJCNLP 2009 Software Demonstrations
Gary Geunbae Lee | Sabine Schulte im Walde
Proceedings of the ACL-IJCNLP 2009 Software Demonstrations

pdf
Semi-supervised Speech Act Recognition in Emails and Forums
Minwoo Jeong | Chin-Yew Lin | Gary Geunbae Lee
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf
POSTECH machine translation system for IWSLT 2008 evaluation campaign.
Jonghoon Lee | Gary Geunbae Lee
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper, we describe POSTECH system for IWSLT 2008 evaluation campaign. The system is based on phrase based statistical machine translation. We set up a baseline system using well known freely available software. A preprocessing method and a language modeling method have been applied to the baseline system in order to improve machine translation quality. The preprocessing method is to identify and remove useless tokens in source texts. And the language modeling method models phrase level n-gram. We have participated in the BTEC tasks to see the effects of our methods.

pdf bib
Transformation-based Sentence Splitting method for Statistical Machine Translation
Jonghoon Lee | Donghyeon Lee | Gary Geunbae Lee
Proceedings of the Workshop on Technologies and Corpora for Asia-Pacific Speech Translation (TCAST)

pdf
Robust Dialog Management with N-Best Hypotheses Using Dialog Examples and Agenda
Cheongjae Lee | Sangkeun Jung | Gary Geunbae Lee
Proceedings of ACL-08: HLT

pdf
A Frame-Based Probabilistic Framework for Spoken Dialog Management Using Dialog Examples
Kyungduk Kim | Cheongjae Lee | Sangkeun Jung | Gary Geunbae Lee
Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue

pdf
An Integrated Dialog Simulation Technique for Evaluating Spoken Dialog Systems
Sangkeun Jung | Cheongjae Lee | Kyungduk Kim | Gary Geunbae Lee
Coling 2008: Proceedings of the workshop on Speech Processing for Safety Critical Translation and Pervasive Applications

2007

pdf
POSSLT: A Korean to English Spoken Language Translation System
Donghyeon Lee | Jonghoon Lee | Gary Geunbae Lee
Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)

pdf
A Joint Statistical Model for Simultaneous Word Spacing and Spelling Error Correction for Korean
Hyungjong Noh | Jeong-Won Cha | Gary Geunbae Lee
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

2006

pdf
Exploiting Non-Local Features for Spoken Language Understanding
Minwoo Jeong | Gary Geunbae Lee
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf
MMR-based Active Machine Learning for Bio Named Entity Recognition
Seokhwan Kim | Yu Song | Kyungduk Kim | Jeong-Won Cha | Gary Geunbae Lee
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

2005

pdf
Heuristic Methods for Reducing Errors of Geographic Named Entities Learned by Bootstrapping
Seungwoo Lee | Gary Geunbae Lee
Second International Joint Conference on Natural Language Processing: Full Papers

pdf
POSBIOTM/W: A Development Workbench for Machine Learning Oriented Biomedical Text Mining System
Kyungduk Kim | Yu Song | Gary Geunbae Lee
Proceedings of HLT/EMNLP 2005 Interactive Demonstrations

2004

pdf bib
MMR-based Feature Selection for Text Categorization
Changki Lee | Gary Geunbae Lee
Proceedings of HLT-NAACL 2004: Short Papers

pdf
POSBIOTM-NER in the Shared Task of BioNLP/NLPBA2004
Yu Song | Eunju Kim | Gary Geunbae Lee | Byoung-kee Yi
Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP)

pdf
Using Higher-level Linguistic Knowledge for Speech Recognition Error Correction in a Spoken Q/A Dialog
Minwoo Jeong | Byeongchang Kim | Gary Geunbae Lee
Proceedings of the HLT-NAACL 2004 Workshop on Spoken Language Understanding for Conversational Systems and Higher Level Linguistic Information for Speech Processing

2003

pdf
Automatic Acquisition of Named Entity Tagged Corpus from World Wide Web
Joohui An | Seungwoo Lee | Gary Geunbae Lee
The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics

2002

pdf bib
Multilingual Question Answering with High Portability on Relational Databases
Hanmin Jung | Gary Geunbae Lee
COLING-02: Multilingual Summarization and Question Answering

pdf
Syllable-Pattern-Based Unknown-Morpheme Segmentation and Estimation for Hybrid Part-of-Speech Tagging of Korean
Gary Geunbae Lee | Jeongwon Cha | Jong-Hyeok Lee
Computational Linguistics, Volume 28, Number 1, March 2002

2001

pdf
Automatic Corpus-based Tone Prediction using K-ToBI Representation
Jin-Seok Lee | Byeongchang Kim | Gary Geunbae Lee
Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing

pdf bib
MAYA: A Fast Question-answering System Based on a Predictive Answer Indexer
Harksoo Kim | Kyungsun Kim | Gary Geunbae Lee | Jungyun Seo
Proceedings of the ACL 2001 Workshop on Open-Domain Question Answering

2000

pdf
Corpus-Based Learning of Compound Noun Indexing
Byung-Kwan Kwak | Jee-Hyub Kim | Geunbae Lee | Jung Yun Seo
ACL-2000 Workshop on Recent Advances in Natural Language Processing and Information Retrieval

pdf
Structural disambiguation of morpho-syntactic categorial parsing for Korean
Jeongwon Cha | Geunbae Lee
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

pdf
Decision-Tree based Error Correction for Statistical Phrase Break Prediction in Korean
Byeongchang Kim | Geunbae Lee
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

pdf
POSCAT: A Morpheme-based Speech Corpus Annotation Tool
Byeongchang Kim | Jin-seok Lee | Jeongwon Cha | Geunbae Lee
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1998

pdf
Unlimited Vocabulary Grapheme to Phoneme Conversion for Korean TTS
Byeongchang Kim | WonIl Lee | Geunbae Lee | Jong-Hyeok Lee
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

pdf
Identifying Syntactic Role of Antecedent in Korean Relative Clause using Corpus and Thesaurus Informationes
Hui-Feng Li | Jong-Hyeok Lee | Geunbae Lee
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

pdf
Unlimited Vocabulary Grapheme to Phoneme Conversion for Korean TTS
Byeongchang Kim | WonIl Lee | Geunbae Lee | Jong-Hyeok Lee
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf
Identifying Syntactic Role of Antecedent in Korean Relative Clause Using Corpus and Thesaurus Information
Hui-Feng Li | Jong-Hyeok Lee | Geunbae Lee
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

pdf
Generalized unknown morpheme guessing for hybrid POS tagging of Korean
Jeongwon Cha | Geunbae Lee | Jong-Hyeok Lee
Sixth Workshop on Very Large Corpora

1994

pdf
Table-driven Neural Syntactic Analysis of Spoken Korean
Wonll Lee | Geunbae Lee | Jong-Hyeok Lee
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics