Ruiqing Li
2026
DFAMS: Dynamic-flow guided Federated Alignment based Multi-prototype Search
Zhibang Yang | Xinke Jiang | Rihong Qiu | Ruiqing Li | Yihang Zhang | Yue Fang | Yongxin Xu | Hongxin Ding | Xu Chu | Junfeng Zhao | Yasha Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhibang Yang | Xinke Jiang | Rihong Qiu | Ruiqing Li | Yihang Zhang | Yue Fang | Yongxin Xu | Hongxin Ding | Xu Chu | Junfeng Zhao | Yasha Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Federated Retrieval (FR) routes queries across multiple external knowledge sources, to mitigate hallucinations of LLMs, when necessary external knowledge is distributed. However, existing methods struggle to retrieve high-quality and relevant documents for ambiguous queries, especially in cross-domain scenarios, which significantly limits their effectiveness in supporting downstream generation tasks. Inspired by Dynamic Information Flow (DIF), we propose DFAMS, a novel framework that leverages DIF to identify latent query intents and construct semantically aligned knowledge partitions for accurate retrieval across heterogeneous sources. Specifically, DFAMS probes the DIF in LLMs by leveraging gradient signals from a few annotated queries and employing Shapley value-based attribution to trace neuron activation paths associated with intent recognition and subdomain boundary detection. Then, DFAMS leverages DIF to train an alignment module via multi-prototype contrastive learning, enabling fine-grained intra-source modeling and inter-source semantic alignment across knowledge bases. Experimental results across five benchmarks show that DFAMS outperforms advanced FR methods by up to 14.37% in knowledge classification accuracy, 5.38% in retrieval recall, and 6.45% in downstream QA accuracy, demonstrating its effectiveness in complex FR scenarios. Our code is publicly available at https://github.com/Artessay/DFAMS.
2024
Combating Label Sparsity in Short Text Topic Modeling via Nearest Neighbor Augmentation
Yang Lin | Xinyu Ma | Xin Gao | Ruiqing Li | Yasha Wang | Xu Chu
Findings of the Association for Computational Linguistics: ACL 2024
Yang Lin | Xinyu Ma | Xin Gao | Ruiqing Li | Yasha Wang | Xu Chu
Findings of the Association for Computational Linguistics: ACL 2024
Extracting semantic topics from short texts presents a significant challenge in the field of data mining. While efforts have been made to mitigate data sparsity issue, the limited length of short documents also results in the absence of semantically relevant words, causing biased evidence lower bound and incomplete labels for likelihood maximization. We refer to this issue as the label sparsity problem. To combat this problem, we propose kNNTM, a neural short text topic model that incorporates a k-Nearest-Neighbor-based label completion algorithm by augmenting the reconstruction label with k-nearest documents to complement these relevant but unobserved words. Furthermore, seeking a precise reflection of distances between documents, we propose a fused multi-view distances metric that takes both local word similarities and global topic semantics into consideration. Extensive experiments on multiple public short-text datasets show that kNNTM model outperforms the state-of-the-art baseline models and can derive both high-quality topics and document representations.