Cheng Yuan


2026

Logical table-to-text generation aims to generate natural language descriptions that fluently and precisely describe the given table with both surface-level and logic-level fidelity. Although large language models (LLMs) have demonstrated strong capabilities in plain text, their proficiency in interpreting and reasoning tabular data is still limited. In this paper, we are the first to comprehensively explore the performance of various LLMs in the logical table-to-text generation task. However, we find that existing LLMs are difficult to achieve satisfactory results in this task. Even worse, existing prompt strategies cannot cope with complex non-chain logical reasoning scenarios on tables. To address the challenges mentioned above, we constructed a new table-related instruction dataset called LogicTableInstruct and instruction-tuned the open-source LLM on this dataset, resulting in the specialized LLM (LogicTableLLaMA-3.1-8B) for table-related tasks. We also introduced a novel reasoning method, Logic Tree-of-Program (LogicToP), to improve the logical reasoning ability of the LLMs on tables. Our extensive experiments on various LLMs demonstrated that LogicToP can effectively improve the performance of LLMs on this task. Our LogicTableLLaMA-3.1-8B model in the 5-shot LogicToP setting achieves state-of-the-art results on the Logic2Text dataset. The code and data will be released at https://github.com/FXLP/LogToP to boost future work on table-related tasks.
Biomedical data-to-text generation aims at generating textual natural language descriptions that can fluently and precisely describe the biomedical structured data. However, biomedical data-to-text generation faces the dilemma of a lack of labeled data due to the privacy and scarcity of medical data. Large language models (LLMs) have demonstrated the ability to solve few-shot tasks through in-context learning (ICL). In this paper, we are the first to explore the performance of different LLMs in the biomedical data-to-text generation task.To address the issues of semantic sparsity and misinterpretation of numerical values in biomedical structured data, we propose an EAG (Enrich, Aggregate, and Generate) framework, a simple but efficient LLM-based three-stage biomedical D2T approach in low-resource scenarios. We conduct extensive evaluations of closed-source general LLMs, open-source general LLMs, and open-source medical LLMs. The results show that the EAG framework provides good interpretability and superior performance, achieving state-of-the-art performance on the BioLeaflets dataset. The code and data will be released at https://github.com/FXLP/EAG.

2025

Despite the remarkable performance of Large Language Models (LLMs) in automated discharge summary generation, they still suffer from generating inaccurate content or fabricating information without valid sources. To address these issues, we propose LCDS, a tool for empowering LLMs with Logic-Controlled Discharge Summary generation. LCDS constructs a source mapping table by calculating the textual similarity between electronic medical records (EMRs) and discharge summaries, providing a structured reference for generation. Based on a comprehensive set of logical rules, LCDS identifies the structured writing logic of discharge summaries and integrates it with EMRs to generate silver discharge summaries. Furthermore, LCDS traces the provenance of generated content, allowing experts to review, provide feedback, and rectify errors to produce golden discharge summaries, which are subsequently recorded for the incremental fine-tuning of LLMs.Our project and demo video are in the GitHub repository https://github.com/ycycyc02/LCDS.