Guangya Yu
2026
LogToP: Logic Tree-of-Program with Table Instruction-tuned LLMs for Controlled Logical Table-to-Text Generation
Yupian Lin | Guangya Yu | Cheng Yuan | Huan Du | Hui Luo | Yuang Bian | Jingping Liu | Zhidong He | Wen Du | Tong Ruan
Findings of the Association for Computational Linguistics: EACL 2026
Yupian Lin | Guangya Yu | Cheng Yuan | Huan Du | Hui Luo | Yuang Bian | Jingping Liu | Zhidong He | Wen Du | Tong Ruan
Findings of the Association for Computational Linguistics: EACL 2026
Logical table-to-text generation aims to generate natural language descriptions that fluently and precisely describe the given table with both surface-level and logic-level fidelity. Although large language models (LLMs) have demonstrated strong capabilities in plain text, their proficiency in interpreting and reasoning tabular data is still limited. In this paper, we are the first to comprehensively explore the performance of various LLMs in the logical table-to-text generation task. However, we find that existing LLMs are difficult to achieve satisfactory results in this task. Even worse, existing prompt strategies cannot cope with complex non-chain logical reasoning scenarios on tables. To address the challenges mentioned above, we constructed a new table-related instruction dataset called LogicTableInstruct and instruction-tuned the open-source LLM on this dataset, resulting in the specialized LLM (LogicTableLLaMA-3.1-8B) for table-related tasks. We also introduced a novel reasoning method, Logic Tree-of-Program (LogicToP), to improve the logical reasoning ability of the LLMs on tables. Our extensive experiments on various LLMs demonstrated that LogicToP can effectively improve the performance of LLMs on this task. Our LogicTableLLaMA-3.1-8B model in the 5-shot LogicToP setting achieves state-of-the-art results on the Logic2Text dataset. The code and data will be released at https://github.com/FXLP/LogToP to boost future work on table-related tasks.
AIDA-SEAT: Towards Reliable AI Doctor Assistant via State-Evaluation-Action Tree Enhanced LLMs in Online Hospital
Lianxin Sun | Xiaoying Ying | Guangya Yu | Weiyan Zhang | Chenhao Guan | Hao He | Mingxi SHANG | Jianhua Li | ChunMing Wang | Tong Ruan
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Lianxin Sun | Xiaoying Ying | Guangya Yu | Weiyan Zhang | Chenhao Guan | Hao He | Mingxi SHANG | Jianhua Li | ChunMing Wang | Tong Ruan
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Artificial intelligence doctor assistants (AIDAs) help streamline clinical decision-making and reduce physician workload. While existing systems primarily utilize Large Language Models (LLMs) or retrieval-augmented generation (RAG), these methods typically retrieve static facts—whether as text passages or structured graphs—lacking the explicit logical pathways essential for multi-step reasoning. In this paper, we propose the AIDA-SEAT framework to provide reliable clinical decision-making support. First, we design the state-evaluation-action tree (SEAT), which covers diagnosis, treatment, and examination. To develop this tree, we refine and transform SEAT collected from medical documents and doctors. Then, we propose an adaptive method to select optimal trees tailored to the current patients’ state. Finally, we leverage LLMs to perform state assessment, evaluation, and action execution based on the tree, thereby generating reliable responses. To evaluate the effectiveness of our method, we conducted extensive experiments on a self-built dataset. Our method achieves 1.01% higher than current state-of-the-art (SOTA) baselines across five departments, including common RAG-based methods. Furthermore, analysis of 200 consultation records during deployment on an online hospital revealed that system-assisted responses are 24.16 seconds faster on average than manual ones, improving efficiency by 26.85%.
2025
CMQCIC-Bench: A Chinese Benchmark for Evaluating Large Language Models in Medical Quality Control Indicator Calculation
Guangya Yu | Yanhao Li | Zongying Jiang | Yuxiong Jin | Li Dai | Yupian Lin | Ruihui Hou | Weiyan Zhang | Yongqi Fan | Qi Ye | Jingping Liu | Tong Ruan
Findings of the Association for Computational Linguistics: ACL 2025
Guangya Yu | Yanhao Li | Zongying Jiang | Yuxiong Jin | Li Dai | Yupian Lin | Ruihui Hou | Weiyan Zhang | Yongqi Fan | Qi Ye | Jingping Liu | Tong Ruan
Findings of the Association for Computational Linguistics: ACL 2025
Medical quality control indicators are essential to assess the qualifications of healthcare institutions for medical services. With the impressive performance of large language models (LLMs) like GPT-4 in the medical field, leveraging these technologies for the Medical Quality Control Indicator Calculation (MQCIC) presents a promising approach. In this work, (1) we introduce a real-world task MQCIC and propose an open-source Chinese electronic medical records (EMRs)-based dataset (CMQCIC-Bench) comprising 785 instances and 76 indicators. (2) We propose a semi-automatic method to enhance the rule representation. Then we propose the Clinical Facts-based Inferential Rule (CF-IR) method that disentangles the clinical fact verification and inferential rule reasoning actions. (3) We conduct comprehensive experiments on 20 representative LLMs, covering general and medical models. Our findings reveal that CF-IR outperforms Chain-of-Thought methods in MQCIC tasks. (4) We conduct an error analysis and investigate the capabilities of clinical fact verification and inferential rule reasoning, providing insights to improve performance in the MQCIC further. The dataset and code is available in this repository https://github.com/YuY-2001/C-MQCIC.
Text-to-ES Bench: A Comprehensive Benchmark for Converting Natural Language to Elasticsearch Query
Dongge Xue | Zhili Pu | Zhentao Xia | Hongli Sun | Ruihui Hou | Guangya Yu | Yupian Lin | Yongqi Fan | Jingping Liu | Tong Ruan
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Dongge Xue | Zhili Pu | Zhentao Xia | Hongli Sun | Ruihui Hou | Guangya Yu | Yupian Lin | Yongqi Fan | Jingping Liu | Tong Ruan
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Elasticsearch (ES) is a distributed RESTful search engine optimized for large-scale and long-text search scenarios. Recent research on text-to-Query has explored using large language models (LLMs) to convert user query intent to executable code, making it an increasingly popular research topic. To our knowledge, we are the first to introduce the novel semantic parsing task text-to-ES. To bridge the gap between LLM and ES, in detail, we leverage LLMs and employ domain experts to generate ES query bodies, which are Domain-Specific Language (DSL), along with the corresponding post-processing code to support multi-index ES queries. Consequently, we propose the text-to-ES benchmark that consists of two datasets: Large Elasticsearch Dataset (LED), containing 26,207 text-ES pairs derived from a 224.9GB schema-free database, and ElasticSearch (BirdES)with 10,926 pairs sourced from the Bird dataset on a 33.4GB schema-fixed database. Compared with fourteen advanced LLMs and six code-based LLMs, the model we trained outperformed DeepSeek-R1 by 15.64% on the LED dataset, setting a new state-of-the-art, and achieved 78% of DeepSeek-R1’s performance on the BirdES dataset. Additionally, we provide in-depth experimental analyses and suggest future research directions for this task. Our datasets are available at https://huggingface.co/datasets/Barry1915/Text-to-ES.