Massimiliano Pronesti
2025
Query-driven Document-level Scientific Evidence Extraction from Biomedical Studies
Massimiliano Pronesti
|
Joao H Bettencourt-Silva
|
Paul Flanagan
|
Alessandra Pascale
|
Oisín Redmond
|
Anya Belz
|
Yufang Hou
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Extracting scientific evidence from biomedical studies for clinical research questions (e.g., Does stem cell transplantation improve quality of life in patients with medically refractory Crohn’s disease compared to placebo?) is a crucial step in synthesising biomedical evidence. In this paper, we focus on the task of document-level scientific evidence extraction for clinical questions with conflicting evidence. To support this task, we create a dataset called CochraneForest leveraging forest plots from Cochrane systematic reviews. It comprises 202 annotated forest plots, associated clinical research questions, full texts of studies, and study-specific conclusions. Building on CochraneForest, we propose URCA (Uniform Retrieval Clustered Augmentation), a retrieval-augmented generation framework designed to tackle the unique challenges of evidence extraction. Our experiments show that URCA outperforms the best existing methods by up to 10.3% in F1 score on this task. However, the results also underscore the complexity of CochraneForest, establishing it as a challenging testbed for advancing automated evidence synthesis systems.
2024
Filling Gaps in Wikipedia: Leveraging Data-to-Text Generation to Improve Encyclopedic Coverage of Underrepresented Groups
Simon Mille
|
Massimiliano Pronesti
|
Craig Thomson
|
Michela Lorandi
|
Sophie Fitzpatrick
|
Rudali Huidrom
|
Mohammed Sabry
|
Amy O’Riordan
|
Anya Belz
Proceedings of the 17th International Natural Language Generation Conference: System Demonstrations
Wikipedia is known to have systematic gaps in its coverage that correspond to under-resourced languages as well as underrepresented groups. This paper presents a new tool to support efforts to fill in these gaps by automatically generating draft articles and facilitating post-editing and uploading to Wikipedia. A rule-based generator and an input-constrained LLM are used to generate two alternative articles, enabling the often more fluent, but error-prone, LLM-generated article to be content-checked against the more reliable, but less fluent, rule-generated article.
Search
Fix author
Co-authors
- Anja Belz 2
- Joao H Bettencourt-Silva 1
- Sophie Fitzpatrick 1
- Paul Flanagan 1
- Yufang Hou 1
- show all...