Measuring Risk of Bias in Biomedical Reports: The RoBBR Benchmark
Jianyou Wang, Weili Cao, Longtian Bao, Youze Zheng, Gil Pasternak, Kaicheng Wang, Xiaoyue Wang, Ramamohan Paturi, Leon Bergen
Abstract
Systems that answer questions by reviewing the scientific literature are becoming increasingly feasible. To draw reliable conclusions, these systems should take into account the quality of available evidence from different studies, placing more weight on studies that use a valid methodology. We present a benchmark for measuring the methodological strength of biomedical papers, drawing on the risk-of-bias framework used for systematic reviews. Derived from over 500 biomedical studies, the three benchmark tasks encompass expert reviewers’ judgments of studies’ research methodologies, including the assessments of risk of bias within these studies. The benchmark contains a human-validated annotation pipeline for fine-grained alignment of reviewers’ judgments with research paper sentences. Our analyses show that large language models’ reasoning and retrieval capabilities impact their effectiveness with risk-of-bias assessment. The dataset is available at https://github.com/RoBBR-Benchmark/RoBBR.- Anthology ID:
- 2025.emnlp-main.160
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3220–3248
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.160/
- DOI:
- Cite (ACL):
- Jianyou Wang, Weili Cao, Longtian Bao, Youze Zheng, Gil Pasternak, Kaicheng Wang, Xiaoyue Wang, Ramamohan Paturi, and Leon Bergen. 2025. Measuring Risk of Bias in Biomedical Reports: The RoBBR Benchmark. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 3220–3248, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- Measuring Risk of Bias in Biomedical Reports: The RoBBR Benchmark (Wang et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.160.pdf