2025
pdf
bib
abs
ReSCORE: Label-free Iterative Retriever Training for Multi-hop Question Answering with Relevance-Consistency Supervision
Dosung Lee
|
Wonjun Oh
|
Boyoung Kim
|
Minyoung Kim
|
Joonsuk Park
|
Paul Hongsuck Seo
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multi-hop question answering (MHQA) involves reasoning across multiple documents to answer complex questions. Dense retrievers typically outperform sparse methods like BM25 by leveraging semantic embeddings in many tasks; however, they require labeled query-document pairs for fine-tuning, which poses a significant challenge in MHQA due to the complexity of the reasoning steps. To overcome this limitation, we introduce Retriever Supervision with Consistency and Relevance (ReSCORE), a novel method for training dense retrievers for MHQA without the need for labeled documents. ReSCORE leverages large language models to measure document-question relevance with answer consistency and utilizes this information to train a retriever within an iterative question-answering framework. Evaluated on three MHQA benchmarks, our extensive experiments demonstrate the effectiveness of ReSCORE, with significant improvements in retrieval performance that consequently lead to state-of-the-art Exact Match and F1 scores for MHQA.
pdf
bib
abs
LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs
Sumin An
|
Junyoung Sung
|
Wonpyo Park
|
Chanjun Park
|
Paul Hongsuck Seo
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
While large language models (LLMs) excel in generating coherent and contextually rich outputs, their capacity to efficiently handle long-form contexts is limited by fixed-length position embeddings. Additionally, the computational cost of processing long sequences increases quadratically, making it challenging to extend context length. To address these challenges, we propose Long-form Context Injection with Recurrent Compression (LCIRC), a method that enables the efficient processing long-form sequences beyond the model’s length limit through recurrent compression without retraining the entire model. We further introduce query dependent context modeling, which selectively compresses query-relevant information, ensuring that the model retains the most pertinent content. Our empirical results demonstrate that Query Dependent LCIRC (QD-LCIRC) significantly improves LLM’s ability to manage extended contexts, making it well-suited for tasks that require both comprehensive context understanding and query relevance.
2022
pdf
bib
Proceedings of the 1st Workshop on Customized Chat Grounding Persona and Knowledge
Heuiseok Lim
|
Seungryong Kim
|
Yeonsoo Lee
|
Steve Lin
|
Paul Hongsuck Seo
|
Yumin Suh
|
Yoonna Jang
|
Jungwoo Lim
|
Yuna Hur
|
Suhyune Son
Proceedings of the 1st Workshop on Customized Chat Grounding Persona and Knowledge
2015
pdf
bib
Conversational Knowledge Teaching Agent that uses a Knowledge Base
Kyusong Lee
|
Paul Hongsuck Seo
|
Junhwi Choi
|
Sangjun Koo
|
Gary Geunbae Lee
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue
2012
pdf
bib
abs
Grammatical Error Annotation for Korean Learners of Spoken English
Hongsuck Seo
|
Kyusong Lee
|
Gary Geunbae Lee
|
Soo-Ok Kweon
|
Hae-Ri Kim
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
The goal of our research is to build a grammatical error-tagged corpus for Korean learners of Spoken English dubbed Postech Learner Corpus. We collected raw story-telling speech from Korean university students. Transcription and annotation using the Cambridge Learner Corpus tagset were performed by six Korean annotators fluent in English. For the annotation of the corpus, we developed an annotation tool and a validation tool. After comparing human annotation with machine-recommended error tags, unmatched errors were rechecked by a native annotator. We observed different characteristics between the spoken language corpus built in this study and an existing written language corpus.
pdf
bib
A Meta Learning Approach to Grammatical Error Correction
Hongsuck Seo
|
Jonghoon Lee
|
Seokhwan Kim
|
Kyusong Lee
|
Sechun Kang
|
Gary Geunbae Lee
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)