Amit Sangroya


2025

pdf bib
LLM Based Efficient CSR Summarization using Structured Fact Extraction and Feedback
Kunwar Zaid | Amit Sangroya | Lovekesh Vig
Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health)

Summarizing clinical trial data poses a significant challenge due to the structured, voluminous, and domain-specific nature of clinical tables. While large language models (LLMs) such as ChatGPT, Llama, and DeepSeek demonstrate potential in table-to-text generation, they struggle with raw clinical tables that exceed context length, leading to incomplete, inconsistent, or imprecise summaries. These challenges stem from the structured nature of clinical tables, complex study designs, and the necessity for precise medical terminology. To address these limitations, we propose an end-to-end pipeline that enhances the summarization process by integrating fact selection, ensuring that only the most relevant data points are extracted for summary generation. Our approach also incorporates a feedback-driven refinement mechanism, allowing for iterative improvements based on domain-specific requirements and external expert input. By systematically filtering critical information and refining outputs, our method enhances the accuracy, completeness, and clinical reliability of generated summaries while reducing irrelevant or misleading content. This pipeline significantly improves the usability of LLM-generated summaries for medical professionals, regulators, and researchers, facilitating more efficient interpretation of clinical trial results. Our findings suggest that targeted preprocessing and iterative refinement strategies within the proposed piepline can mitigate LLM limitations, offering a scalable solution for summarizing complex clinical trial tables.

pdf bib
Multilingual Clinical Dialogue Summarization and Information Extraction with Qwen-1.5B LoRA
Kunwar Zaid | Amit Sangroya | Jyotsana Khatri
NLP-AI4Health

This paper describes our submission to theNLP-AI4Health 2025 Shared Task on multi-lingual clinical dialogue summarization andstructured information extraction. Our systemis based on Qwen-1.5B Instruct fine-tuned withLoRA adapters for parameter-efficient adapta-tion. The pipeline produces (i) concise Englishsummaries, (ii) schema-aligned JSON outputs,and (iii) multilingual Q&A responses. TheQwen-based approach substantially improvessummary fluency, factual completeness, andJSON field coverage while maintaining effi-ciency within constrained GPU resources.