Wei Ni
2026
Agent-based Substructure Counting under Local Differential Privacy
Yuting Zhang | Kai Wang | Wei Ni | Ying Zhang | Wenjie Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yuting Zhang | Kai Wang | Wei Ni | Ying Zhang | Wenjie Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent studies have demonstrated the ability of Large Language Models (LLMs) in processing various graph problems. Substructure counting remains challenging in both scalability and accuracy. Incorporating sensitive edge information into the input prompts also introduces significant privacy risks of exposing the private information of user connections in real-world applications. This paper, for the first time, studies substructure counting for LLMs under edge local differential privacy (LDP) in a multi-agent framework. Unlike the Naive approach whose estimation relies entirely on overly dense noisy graphs, the proposed PSC framework decomposes substructure counting into node-level tasks distributed among node agents, and embeds the knowledge of distributed algorithms and DP frameworks in the curator agent and privacy controller, respectively. Thus, we can leverage the local neighboring information and reasoning capabilities of node agents to improve the estimation accuracy. Extensive experiments on 6 real-world datasets validate the effectiveness of PSC framework for substructure counting tasks under 𝜀-edge LDP. Moreover, the non-DP version of PSC also demonstrated superior performance over a single LLM on standard substructure counting tasks.
2025
pFedRAG: A Personalized Federated Retrieval-Augmented Generation System with Depth-Adaptive Tiered Embedding Tuning
Hangyu He | Xin Yuan | Kai Wu | Ren Ping Liu | Wei Ni
Findings of the Association for Computational Linguistics: EMNLP 2025
Hangyu He | Xin Yuan | Kai Wu | Ren Ping Liu | Wei Ni
Findings of the Association for Computational Linguistics: EMNLP 2025
Large Language Models (LLMs) can undergo hallucinations in specialized domains, and standard Retrieval-Augmented Generation (RAG) often falters due to general-purpose embeddings ill-suited for domain-specific terminology. Though domain-specific fine-tuning enhances retrieval, centralizing data introduces privacy risks. The use of federated learning (FL) can alleviate this to some extent, but faces challenges of data heterogeneity, poor personalization, and expensive training data generation. We propose pFedRAG, a novel Personalized Federated RAG framework, which enables efficient collaborative fine-tuning of embedding models to address these challenges. The key contribution is a new Depth-Adaptive Tiered Embedding (DATE) architecture, which comprises a Global Shared Layer, combined using FL to capture common knowledge, and a Personalized Layer with adjustable depth tailored for local data and training results of each client. The depth is locally controlled based on crafted metrics and scoring criteria. Also, pFedRAG incorporates a fully client-side pipeline leveraging local small LLMs and vector database filtering to construct high-quality query-document pairs. Experiments on diverse medical non-IID document datasets demonstrate that pFedRAG significantly reduces communication costs, handles data heterogeneity, and improves retrieval performance. Human evaluations confirm the enhanced response quality of pFedRAG.