Kyungho Kim


Fixing paper assignments

  1. Please select all papers that do not belong to this person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2024

pdf bib
Non-Essential Is NEcessary: Order-agnostic Multi-hop Question Generation
Kyungho Kim | Seongmin Park | Junseo Lee | Jihwa Lee
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Existing multi-hop question generation (QG) methods treat answer-irrelevant documents as non-essential and remove them as impurities. However, this approach can create a training-inference discrepancy when impurities cannot be completely removed, which can lead to a decrease in model performance. To overcome this problem, we propose an auxiliary task, called order-agnostic, which leverages non-essential data in the training phase to create a robust model and extract the consistent embeddings in real-world inference environments. Additionally, we use a single LM to perform both ranker and generator through a prompt-based approach without applying additional external modules. Furthermore, we discover that appropriate utilization of the non-essential components can achieve a significant performance increase. Finally, experiments conducted on HotpotQA dataset achieve state-of-the-art.

pdf bib
RT-VQ2A2: Real Time Vector Quantized Question Answering with ASR
Kyungho Kim | Seongmin Park | Jihwa Lee
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In Spoken Question Answering (SQA), automatic speech recognition (ASR) outputs are often relayed to language models for QA. However, constructing such a cascaded framework with large language models (LLMs) in a real-time SQA setting involves realistic challenges, such as noise in the ASR output, the limited context length of LLMs, and latency in processing large models. This paper proposes a novel model-agnostic framework, RT-VQ2A2, to address these challenges. RT-VQ2A2 consists of three steps: codebook preparation, quantized semantic vector extractor, and dual segment selector. We construct a codebook from clustering, removing outliers on a text corpus derived from ASR to mitigate the influence of ASR error. Extracting quantized semantic vectors through a pre-built codebook shows significant speed and performance improvements in relevant context retrieval. Dual segment selector considers both semantic and lexical aspects to deal with ASR error. The efficacy of RT-VQ2A2 is validated on the widely used Spoken-SQuAD dataset.

2023

pdf bib
Cross-task Knowledge Transfer for Extremely Weakly Supervised Text Classification
Seongmin Park | Kyungho Kim | Jihwa Lee
Findings of the Association for Computational Linguistics: ACL 2023

Text classification with extremely weak supervision (EWS) imposes stricter supervision constraints compared to regular weakly supervise classification. Absolutely no labeled training samples or hand-crafted rules specific to the evaluation data are allowed. Such restrictions limit state-of-the-art EWS classification methods to indirect weak labeling techniques that assign unnatural label uncertainty estimates. We present PLAT, a framework that creates weak labels by leveraging recent developments in zero-shot text classification. PLAT employs models trained for sub-tasks other than classification to label documents. Most importantly, PLAT refrains from assigning overly confident weak labels and improves soft-label training performance for downstream classifiers. Classifiers trained with PLAT significantly outperform those trained on weak labels generated by the previous state-of-the-art in extremely weakly supervised text classification.

2021

pdf bib
Query Generation for Multimodal Documents
Kyungho Kim | Kyungjae Lee | Seung-won Hwang | Young-In Song | Seungwook Lee
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

This paper studies the problem of generatinglikely queries for multimodal documents withimages. Our application scenario is enablingefficient “first-stage retrieval” of relevant doc-uments, by attaching generated queries to doc-uments before indexing. We can then indexthis expanded text to efficiently narrow downto candidate matches using inverted index, sothat expensive reranking can follow. Our eval-uation results show that our proposed multi-modal representation meaningfully improvesrelevance ranking. More importantly, ourframework can achieve the state of the art inthe first stage retrieval scenarios