Pengcheng Xu


2025

pdf bib
Team XSZ at BioLaySumm2025: Section-Wise Summarization, Retrieval-Augmented LLM, and Reinforcement Learning Fine-Tuning for Lay Summaries
Pengcheng Xu | Sicheng Shen | Jieli Zhou | Hongyi Xin
BioNLP 2025 Shared Tasks

We propose a unified, multi-stage lay summarization pipeline for BioLaySumm 2025 (Subtask 1.1) that (1) selects and summarizes key article sections via BioBART, (2) retrieves K-shot demonstrations using BGE embeddings for in-context Llama 3 8B prompting, (3) applies LoRA adapters to Llama 3 8B for supervised fine-tuning, (4) merges section summaries with a second BioBART pass, and (5) refines outputs through reinforcement learning (PPO & GRPO) using a composite reward of factuality (AlignScore, SummaC), relevance (ROUGE-L, BERTScore), and readability (LENS, FKGL, DCRS, CLI). On PLOS and eLife validation sets, our complete systemreduces DCRS from 9.23 to 8.56 and reduces CLI from 12.98 to 12.65, ranking 3rd in readability. and outperforms llama3 finetune baseline in AlignScore 0.722 to 0.862, ranking 5th in factuality, demonstrating balanced gains across readability, relevance, and factuality.

2024

pdf bib
Team YXZ at BioLaySumm: Adapting Large Language Models for Biomedical Lay Summarization
Jieli Zhou | Cheng Ye | Pengcheng Xu | Hongyi Xin
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing

Biomedical literature are crucial for disseminating new scientific findings. However, the complexity of these research articles often leads to misinterpretations by the public. To address this urgent issue, we participated in the BioLaySumm task at the 2024 ACL BioNLP workshop, which focuses on automatically simplifying technical biomedical articles for non-technical audiences. We conduct a systematic evaluation of the SOTA large language models (LLMs) in 2024 and found that LLMs can generally achieve better readability scores than smaller models like Bart. Then we iteratively developed techniques of title infusing, K-shot prompting , LLM rewriting and instruction finetuning to further boost readability while balancing factuality and relevance. Notably, our submission achieved the first place in readability at the workshop, and among the top-3 teams with the highest readability scores, we have the best overall rank. Here, we present our experiments and findings on how to effectively adapt LLMs for automatic lay summarization. Our code is available at https://github.com/zhoujieli/biolaysumm.