Sitong Zhou
2026
RadTimeline: Timeline Summarization for Longitudinal Radiological Lung Findings
Sitong Zhou | Meliha Yetisgen | Mari Ostendorf
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Sitong Zhou | Meliha Yetisgen | Mari Ostendorf
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Tracking findings in longitudinal radiology reports is crucial for accurately identifying disease progression, and the time-consuming process would benefit from automatic summarization. This work introduces a structured summarization task, where we frame longitudinal report summarization as a timeline generation task, with dated findings organized in columns and temporally related findings grouped in rows. This structured summarization format enables straightforward comparison of findings across time and facilitates fact-checking against the associated reports. The timeline is generated using a 3-step LLM process of extracting findings, generating group names, and using the names to group the findings. To evaluate such systems, we create RadTimeline, a timeline dataset focused on tracking lung-related radiologic findings in chest-related imaging reports. Experiments on RadTimeline show tradeoffs of different-sized LLMs and prompting strategies. Our results highlight that group name generation as an intermediate step is critical for effective finding grouping. The best configuration has some irrelevant findings but very good recall, and grouping performance is comparable to human annotators.
2023
Building blocks for complex tasks: Robust generative event extraction for radiology reports under domain shifts
Sitong Zhou | Meliha Yetisgen | Mari Ostendorf
Proceedings of the 5th Clinical Natural Language Processing Workshop
Sitong Zhou | Meliha Yetisgen | Mari Ostendorf
Proceedings of the 5th Clinical Natural Language Processing Workshop
This paper explores methods for extracting information from radiology reports that generalize across exam modalities to reduce requirements for annotated data. We demonstrate that multi-pass T5-based text-to-text generative models exhibit better generalization across exam modalities compared to approaches that employ BERT-based task-specific classification layers. We then develop methods that reduce the inference cost of the model, making large-scale corpus processing more feasible for clinical applications. Specifically, we introduce a generative technique that decomposes complex tasks into smaller subtask blocks, which improves a single-pass model when combined with multitask training. In addition, we leverage target-domain contexts during inference to enhance domain adaptation, enabling use of smaller models. Analyses offer insights into the benefits of different cost reduction strategies.