2025
pdf
bib
abs
TC–RAG: Turing–Complete RAG’s Case study on Medical LLM Systems
Xinke Jiang
|
Yue Fang
|
Rihong Qiu
|
Haoyu Zhang
|
Yongxin Xu
|
Hao Chen
|
Wentao Zhang
|
Ruizhe Zhang
|
Yuchen Fang
|
Xinyu Ma
|
Xu Chu
|
Junfeng Zhao
|
Yasha Wang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In the pursuit of enhancing domain-specific Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) emerges as a promising solution to mitigate issues such as hallucinations, outdated knowledge, and limited expertise in highly specialized queries. However, existing approaches to RAG fall short by neglecting system state variables, which are crucial for ensuring adaptive control, retrieval halting, and system convergence. In this paper, we introduce the Turing-Complete-RAG (TC-RAG) through rigorous proof, a novel framework that addresses these challenges by incorporating a Turing Complete System to manage state variables, thereby enabling more efficient and accurate knowledge retrieval. By leveraging a memory stack system with adaptive retrieval, reasoning, and planning capabilities, TC-RAG not only ensures the controlled halting of retrieval processes but also mitigates the accumulation of erroneous knowledge via Push and Pop actions. In the case study of the medical and general domain, our extensive experiments on seven real-world healthcare and general-domain datasets demonstrate the superiority of TC-RAG over existing methods in accuracy by over 7.20%. Our code, datasets and RAG resources have been available at https://github.com/Artessay/TC-RAG.
pdf
bib
abs
HyKGE: A Hypothesis Knowledge Graph Enhanced RAG Framework for Accurate and Reliable Medical LLMs Responses
Xinke Jiang
|
Ruizhe Zhang
|
Yongxin Xu
|
Rihong Qiu
|
Yue Fang
|
Zhiyuan Wang
|
Jinyi Tang
|
Hongxin Ding
|
Xu Chu
|
Junfeng Zhao
|
Yasha Wang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In this paper, we investigate the retrieval-augmented generation (RAG) based on Knowledge Graphs (KGs) to improve the accuracy and reliability of Large Language Models (LLMs). Recent approaches suffer from insufficient and repetitive knowledge retrieval, tedious and time-consuming query parsing, and monotonous knowledge utilization. To this end, we develop a Hypothesis Knowledge Graph Enhanced (HyKGE) framework, which leverages LLMs’ powerful reasoning capacity to compensate for the incompleteness of user queries, optimizes the interaction process with LLMs, and provides diverse retrieved knowledge. Specifically, HyKGE explores the zero-shot capability and the rich knowledge of LLMs with Hypothesis Outputs to extend feasible exploration directions in the KGs, as well as the carefully curated prompt to enhance the density and efficiency of LLMs’ responses. Furthermore, we introduce the HO Fragment Granularity-aware Rerank Module to filter out noise while ensuring the balance between diversity and relevance in retrieved knowledge. Experiments on two Chinese medical multiple-choice question datasets and one Chinese open-domain medical Q&A dataset with two LLM turbos demonstrate the superiority of HyKGE in terms of accuracy and explainability. Code is available at https://github.com/Artessay/HyKGE.
pdf
bib
abs
3DS: Medical Domain Adaptation of LLMs via Decomposed Difficulty-based Data Selection
Hongxin Ding
|
Yue Fang
|
Runchuan Zhu
|
Xinke Jiang
|
Jinyang Zhang
|
Yongxin Xu
|
Weibin Liao
|
Xu Chu
|
Junfeng Zhao
|
Yasha Wang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large Language Models (LLMs) excel in general language tasks, motivating their adaptation to specialized domains such as healthcare. Effective domain adaptation typically involves supervised fine-tuning (SFT) on carefully selected instruction-tuning data. Current data selection methods adopt a data-centric approach, relying on external annotations and heuristics to identify externally defined high-quality or challenging data. Our exploratory experiments highlight this approach fails to improve the model’s domain performance, due to misalignment between selected data and the model’s knowledge distribution. To tackle this, we propose Decomposed Difficulty-based Data Selection (3DS), a two-stage model-centric data selection framework that aligns data selection with the model’s distribution. 3DS employs Prompt-Driven Data Selection to filter out noise based on the model’s knowledge via explicit alignment in Stage#1, then adopts Decomposed Difficulty-based Data Selection to guide selection via three novel data difficulty metrics, including Instruction Understanding, Response Confidence, and Response Correctness in Stage#2, enhanced by an attention-based importance weighting mechanism for accurate calibration.Extensive experiments in the healthcare domain show 3DS outperforms existing methods by up to 2.97% accuracy, with additional validation in law and general domains, confirming its generalization ability. Our dataset and code are open-sourced at https://github.com/PuppyKnightUniversity/3DS.
pdf
bib
abs
NL2Lean: Translating Natural Language into Lean 4 through Multi-Aspect Reinforcement Learning
Yue Fang
|
Shaohan Huang
|
Xin Yu
|
Haizhen Huang
|
Zihan Zhang
|
Weiwei Deng
|
Furu Wei
|
Feng Sun
|
Qi Zhang
|
Zhi Jin
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Translating natural language into formal language such as Lean 4 has gained attention for its potential to automate formal proof development. Automated methods provide a scalable and cost-effective alternative to manual formalization, driving increasing interest in this task. However, existing LLMs mainly rely on instruction tuning and lack fine-grained structural and semantic alignment, making it difficult to generate syntactically and logically sound formal proofs.To address this, we propose a reinforcement learning framework ReLean that enables LLMs to generate high-quality Lean 4 statements from natural language.We first fine-tune a LLaMA3-8B model on NL–Lean 4 data to obtain a base translator with basic translation ability. Then, we design a multi-aspect dense reward mechanism covering four key dimensions: semantic alignment, term-level alignment, global-level alignment, and compile-checking. Separate reward models are trained via preference modeling, and their normalized outputs are combined to guide optimization via PPO. Finally, a curriculum learning strategy based on multi-dimensional difficulty allows the model to learn progressively from simple to complex cases. Experiments on NL-to-Lean 4 tasks show that our method consistently outperforms baseline models. Further analysis on reward model and curriculum learning confirms their effectiveness in enhancing model performance.
pdf
bib
abs
Enhancing Transformation from Natural Language to Signal Temporal Logic Using LLMs with Diverse External Knowledge
Yue Fang
|
Zhi Jin
|
Jie An
|
Hongshen Chen
|
Xiaohong Chen
|
Naijun Zhan
Findings of the Association for Computational Linguistics: ACL 2025
Temporal Logic (TL), especially Signal Temporal Logic (STL), enables precise formal specification, making it widely used in cyber-physical systems such as autonomous driving and robotics. Automatically transforming NL into STL is an attractive approach to overcome the limitations of manual transformation, which is time-consuming and error-prone. However, due to the lack of datasets, automatic transformation currently faces significant challenges and has not been fully explored. In this paper, we propose a NL-STL dataset named STL-Diversity-Enhanced (STL-DivEn), comprising 16,000 samples enriched with diverse patterns. To develop the dataset, we first manually create a small-scale seed set of NL-STL pairs. Next, representative examples are identified through clustering and used to guide large language models (LLMs) in generating additional NL-STL pairs. Finally, diversity and accuracy are ensured through rigorous rule-based filters and human validation. Furthermore, we introduce the Knowledge-Guided STL Transformation (KGST) framework, a novel approach for transforming natural language into STL, involving a generate-then-refine process based on external knowledge. Statistical analysis shows that the STL-DivEn dataset exhibits more diversity than the existing NL-STL dataset. Moreover, both metric-based and human evaluations indicate that our KGST approach outperforms baseline models in transformation accuracy on STL-DivEn and DeepSTL datasets.
2022
pdf
bib
abs
From spoken dialogue to formal summary: An utterance rewriting for dialogue summarization
Yue Fang
|
Hainan Zhang
|
Hongshen Chen
|
Zhuoye Ding
|
Bo Long
|
Yanyan Lan
|
Yanquan Zhou
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Due to the dialogue characteristics of unstructured contexts and multi-parties with first-person perspective, many successful text summarization works have failed when dealing with dialogue summarization. In dialogue summarization task, the input dialogue is usually spoken style with ellipsis and co-references but the output summaries are more formal and complete. Therefore, the dialogue summarization model should be able to complete the ellipsis content and co-reference information and then produce a suitable summary accordingly. However, the current state-of-the-art models pay more attention on the topic or structure of summary, rather than the consistency of dialogue summary with its input dialogue context, which may suffer from the personal and logical inconsistency problem. In this paper, we propose a new model, named ReWriteSum, to tackle this problem. Firstly, an utterance rewriter is conducted to complete the ellipsis content of dialogue content and then obtain the rewriting utterances. Then, the co-reference data augmentation mechanism is utilized to replace the referential person name with its specific name to enhance the personal information. Finally, the rewriting utterances and the co-reference replacement data are used in the standard BART model. Experimental results on both SAMSum and DialSum datasets show that our ReWriteSum significantly outperforms baseline models, in terms of both metric-based and human evaluations. Further analysis on multi-speakers also shows that ReWriteSum can obtain relatively higher improvement with more speakers, validating the correctness and property of ReWriteSum.