Cheng Yuan
2026
LogToP: Logic Tree-of-Program with Table Instruction-tuned LLMs for Controlled Logical Table-to-Text Generation
Yupian Lin | Guangya Yu | Cheng Yuan | Huan Du | Hui Luo | Yuang Bian | Jingping Liu | Zhidong He | Wen Du | Tong Ruan
Findings of the Association for Computational Linguistics: EACL 2026
Yupian Lin | Guangya Yu | Cheng Yuan | Huan Du | Hui Luo | Yuang Bian | Jingping Liu | Zhidong He | Wen Du | Tong Ruan
Findings of the Association for Computational Linguistics: EACL 2026
Logical table-to-text generation aims to generate natural language descriptions that fluently and precisely describe the given table with both surface-level and logic-level fidelity. Although large language models (LLMs) have demonstrated strong capabilities in plain text, their proficiency in interpreting and reasoning tabular data is still limited. In this paper, we are the first to comprehensively explore the performance of various LLMs in the logical table-to-text generation task. However, we find that existing LLMs are difficult to achieve satisfactory results in this task. Even worse, existing prompt strategies cannot cope with complex non-chain logical reasoning scenarios on tables. To address the challenges mentioned above, we constructed a new table-related instruction dataset called LogicTableInstruct and instruction-tuned the open-source LLM on this dataset, resulting in the specialized LLM (LogicTableLLaMA-3.1-8B) for table-related tasks. We also introduced a novel reasoning method, Logic Tree-of-Program (LogicToP), to improve the logical reasoning ability of the LLMs on tables. Our extensive experiments on various LLMs demonstrated that LogicToP can effectively improve the performance of LLMs on this task. Our LogicTableLLaMA-3.1-8B model in the 5-shot LogicToP setting achieves state-of-the-art results on the Logic2Text dataset. The code and data will be released at https://github.com/FXLP/LogToP to boost future work on table-related tasks.
Enrich, Aggregate, and Generate: Three-stage Biomedical Data-to-Text Generation Using Large Language Models in Low-resource Scenarios
Yupian Lin | Guangya Yu | Yuang Bian | Cheng Yuan | Hui Luo | Tong Ruan
Findings of the Association for Computational Linguistics: ACL 2026
Yupian Lin | Guangya Yu | Yuang Bian | Cheng Yuan | Hui Luo | Tong Ruan
Findings of the Association for Computational Linguistics: ACL 2026
Biomedical data-to-text generation aims at generating textual natural language descriptions that can fluently and precisely describe the biomedical structured data. However, biomedical data-to-text generation faces the dilemma of a lack of labeled data due to the privacy and scarcity of medical data. Large language models (LLMs) have demonstrated the ability to solve few-shot tasks through in-context learning (ICL). In this paper, we are the first to explore the performance of different LLMs in the biomedical data-to-text generation task.To address the issues of semantic sparsity and misinterpretation of numerical values in biomedical structured data, we propose an EAG (Enrich, Aggregate, and Generate) framework, a simple but efficient LLM-based three-stage biomedical D2T approach in low-resource scenarios. We conduct extensive evaluations of closed-source general LLMs, open-source general LLMs, and open-source medical LLMs. The results show that the EAG framework provides good interpretability and superior performance, achieving state-of-the-art performance on the BioLeaflets dataset. The code and data will be released at https://github.com/FXLP/EAG.
2025
LCDS: A Logic-Controlled Discharge Summary Generation System Supporting Source Attribution and Expert Review
Cheng Yuan | Xinkai Rui | Yongqi Fan | Yawei Fan | Boyang Zhong | Jiacheng Wang | Weiyan Zhang | Tong Ruan
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Cheng Yuan | Xinkai Rui | Yongqi Fan | Yawei Fan | Boyang Zhong | Jiacheng Wang | Weiyan Zhang | Tong Ruan
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Despite the remarkable performance of Large Language Models (LLMs) in automated discharge summary generation, they still suffer from generating inaccurate content or fabricating information without valid sources. To address these issues, we propose LCDS, a tool for empowering LLMs with Logic-Controlled Discharge Summary generation. LCDS constructs a source mapping table by calculating the textual similarity between electronic medical records (EMRs) and discharge summaries, providing a structured reference for generation. Based on a comprehensive set of logical rules, LCDS identifies the structured writing logic of discharge summaries and integrates it with EMRs to generate silver discharge summaries. Furthermore, LCDS traces the provenance of generated content, allowing experts to review, provide feedback, and rectify errors to produce golden discharge summaries, which are subsequently recorded for the incremental fine-tuning of LLMs.Our project and demo video are in the GitHub repository https://github.com/ycycyc02/LCDS.