Seoho Song
2026
Relevance to Utility: Process-Supervised Rewrite for RAG
Jaeyoung Kim | Jongho Kim | Seung-won Hwang | Seoho Song | Young-In Song
Findings of the Association for Computational Linguistics: ACL 2026
Jaeyoung Kim | Jongho Kim | Seung-won Hwang | Seoho Song | Young-In Song
Findings of the Association for Computational Linguistics: ACL 2026
Retrieval-augmented generation systems often suffer from a gap between optimizing retrieval relevance and generative utility. With such a gap, retrieved documents may be topically relevant but still lack the content needed for effective reasoning during generation. While existing bridge modules attempt to rewrite the retrieved text for better generation, we show how they fail by not capturing "document utility". In this work, we propose R2U, with a key distinction of approximating true utility through joint observation of rewriting and answering in the reasoning process. To distill this observation reliably, R2U scales such supervision to enhance reliability in distillation. We further construct utility-improvement supervision by measuring the generator’s gain of the answer under the rewritten context, yielding signals for fine-tuning and preference optimization. We evaluate our method across multiple open-domain question-answering benchmarks. The empirical results demonstrate consistent improvements over strong bridging baselines.
2025
Query Variant Detection Using Retriever as Environment
Minji Seo | Youngwon Lee | Seung-won Hwang | Seoho Song | Hee-Cheol Seo | Young-In Song
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)
Minji Seo | Youngwon Lee | Seung-won Hwang | Seoho Song | Hee-Cheol Seo | Young-In Song
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)
This paper addresses the challenge of detecting query variants—pairs of queries with identical intents. One application in commercial search engines is reformulating user queries with its variant online. While measuring pairwise query similarity has been an established standard, it often falls short of capturing semantic equivalence when word forms or order differ. We propose leveraging the retrieval as an environment feedback (EF), based on the premise that desirable retrieval outcomes from equivalent queries should be interchangeable. Experimental results on both proprietary and public datasets demonstrate the efficacy of the proposed method, both with and without LLM calls.
2022
Pseudo-Relevance for Enhancing Document Representation
Jihyuk Kim | Seung-won Hwang | Seoho Song | Hyeseon Ko | Young-In Song
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Jihyuk Kim | Seung-won Hwang | Seoho Song | Hyeseon Ko | Young-In Song
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
This paper studies how to enhance the document representation for the bi-encoder approach in dense document retrieval. The bi-encoder, separately encoding a query and a document as a single vector, is favored for high efficiency in large-scale information retrieval, compared to more effective but complex architectures. To combine the strength of the two, the multi-vector representation of documents for bi-encoder, such as ColBERT preserving all token embeddings, has been widely adopted. Our contribution is to reduce the size of the multi-vector representation, without compromising the effectiveness, supervised by query logs. Our proposed solution decreases the latency and the memory footprint, up to 8- and 3-fold, validated on MSMARCO and real-world search query logs.