Xuan Xu

2026

LLMs have become foundational across many NLP applications, driving a shift from an algorithm-centric to a context-centric paradigm. As an important task in text mining, the landscape of topic modeling (TM) is similarly being reshaped by a growing body of LLM-driven research.We review recent TM developments and categorize existing methods into three groups: Classical Algorithm-Centric, LLM-Assisted, and LLM-Centric. For traditional algorithm-centric methods, we refine prior taxonomies and highlight recent advances. For the LLM-Assisted and LLM-Centric settings, we introduce a new taxonomy that emphasizes the role of LLMs and the design of end-to-end workflows, respectively. We examine two key transformations brought by LLM-centric TM: expanded task scope and a shift from model-level improvements to system-level engineering. We also propose a future roadmap for more optimized LLM-Centric TMs and identify ongoing critical challenges. We aim for this survey to spur closer integration between TM and LLMs and to further drive the progress of modern TM.

2025

pdf bib abs

Retrieval-Augmented Generation (RAG) plays a critical role in mitigating hallucinations and improving factual accuracy for Large Language Models (LLMs). While dynamic retrieval techniques aim to determine retrieval timing and content based on model intrinsic needs, existing approaches struggle to generalize effectively in black-box model scenarios. To address this limitation, we propose the Semantic Contribution-Aware Adaptive Retrieval (SCAAR) framework. SCAAR iteratively leverages the semantic importance of words in upcoming sentences to dynamically adjust retrieval thresholds and filter information, retaining the top-𝛼% most semantically significant words for constructing retrieval queries. We comprehensively evaluate SCAAR against baseline methods across four long-form, knowledge-intensive generation datasets using four models. Our method achieved the highest score on each dataset with GPT-4o. Extensive experiments also analyze the impact of various hyperparameters within the framework. Our results demonstrate SCAAR’s superior or competitive performance, showcasing its ability to effectively detect model retrieval needs and construct efficient retrieval queries for relevant knowledge about problem-solving in black-box scenarios. Our code is available on https://github.com/linqinhong/SAC.

Co-authors

J Song 1

Venues

Findings2

Fix author