Sumin An


2025

pdf bib
ReTAG: Retrieval-Enhanced, Topic-Augmented Graph-Based Global Sensemaking
Boyoung Kim | Dosung Lee | Sumin An | Jinseong Jeong | Paul Hongsuck Seo
Findings of the Association for Computational Linguistics: EMNLP 2025

Recent advances in question answering have led to substantial progress in tasks such as multi-hop reasoning. However, global sensemaking—answering questions by synthesizing information from an entire corpus—remains a significant challenge. A prior graph-basedapproach to global sensemaking lacks retrieval mechanisms, topic specificity, and incurs high inference costs. To address these limitations, we propose ReTAG, a RetrievalEnhanced, Topic-Augmented Graph framework that constructs topic-specific subgraphs and retrieves the relevant summaries for response generation. Experiments show that ReTAG improves response quality while significantly reducing inference time compared to the baseline. Our code is available at https://github.com/bykimby/retag.

pdf bib
LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs
Sumin An | Junyoung Sung | Wonpyo Park | Chanjun Park | Paul Hongsuck Seo
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

While large language models (LLMs) excel in generating coherent and contextually rich outputs, their capacity to efficiently handle long-form contexts is limited by fixed-length position embeddings. Additionally, the computational cost of processing long sequences increases quadratically, making it challenging to extend context length. To address these challenges, we propose Long-form Context Injection with Recurrent Compression (LCIRC), a method that enables the efficient processing long-form sequences beyond the model’s length limit through recurrent compression without retraining the entire model. We further introduce query dependent context modeling, which selectively compresses query-relevant information, ensuring that the model retains the most pertinent content. Our empirical results demonstrate that Query Dependent LCIRC (QD-LCIRC) significantly improves LLM’s ability to manage extended contexts, making it well-suited for tasks that require both comprehensive context understanding and query relevance.