LLM Based Efficient CSR Summarization using Structured Fact Extraction and Feedback

Kunwar Zaid, Amit Sangroya, Lovekesh Vig


Abstract
Summarizing clinical trial data poses a significant challenge due to the structured, voluminous, and domain-specific nature of clinical tables. While large language models (LLMs) such as ChatGPT, Llama, and DeepSeek demonstrate potential in table-to-text generation, they struggle with raw clinical tables that exceed context length, leading to incomplete, inconsistent, or imprecise summaries. These challenges stem from the structured nature of clinical tables, complex study designs, and the necessity for precise medical terminology. To address these limitations, we propose an end-to-end pipeline that enhances the summarization process by integrating fact selection, ensuring that only the most relevant data points are extracted for summary generation. Our approach also incorporates a feedback-driven refinement mechanism, allowing for iterative improvements based on domain-specific requirements and external expert input. By systematically filtering critical information and refining outputs, our method enhances the accuracy, completeness, and clinical reliability of generated summaries while reducing irrelevant or misleading content. This pipeline significantly improves the usability of LLM-generated summaries for medical professionals, regulators, and researchers, facilitating more efficient interpretation of clinical trial results. Our findings suggest that targeted preprocessing and iterative refinement strategies within the proposed piepline can mitigate LLM limitations, offering a scalable solution for summarizing complex clinical trial tables.
Anthology ID:
2025.cl4health-1.12
Volume:
Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health)
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Sophia Ananiadou, Dina Demner-Fushman, Deepak Gupta, Paul Thompson
Venues:
CL4Health | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
148–157
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.cl4health-1.12/
DOI:
Bibkey:
Cite (ACL):
Kunwar Zaid, Amit Sangroya, and Lovekesh Vig. 2025. LLM Based Efficient CSR Summarization using Structured Fact Extraction and Feedback. In Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health), pages 148–157, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
LLM Based Efficient CSR Summarization using Structured Fact Extraction and Feedback (Zaid et al., CL4Health 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.cl4health-1.12.pdf