Measuring Risk of Bias in Biomedical Reports: The RoBBR Benchmark

Jianyou Wang; Weili Cao; Longtian Bao; Youze Zheng; Gil Pasternak; Kaicheng Wang; Xiaoyue Wang (王笑月); Ramamohan Paturi; Leon Bergen

Measuring Risk of Bias in Biomedical Reports: The RoBBR Benchmark

Jianyou Wang, Weili Cao, Longtian Bao, Youze Zheng, Gil Pasternak, Kaicheng Wang, Xiaoyue Wang, Ramamohan Paturi, Leon Bergen

Abstract

Systems that answer questions by reviewing the scientific literature are becoming increasingly feasible. To draw reliable conclusions, these systems should take into account the quality of available evidence from different studies, placing more weight on studies that use a valid methodology. We present a benchmark for measuring the methodological strength of biomedical papers, drawing on the risk-of-bias framework used for systematic reviews. Derived from over 500 biomedical studies, the three benchmark tasks encompass expert reviewers’ judgments of studies’ research methodologies, including the assessments of risk of bias within these studies. The benchmark contains a human-validated annotation pipeline for fine-grained alignment of reviewers’ judgments with research paper sentences. Our analyses show that large language models’ reasoning and retrieval capabilities impact their effectiveness with risk-of-bias assessment. The dataset is available at https://github.com/RoBBR-Benchmark/RoBBR.

Anthology ID:: 2025.emnlp-main.160
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3220–3248
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.160/
DOI:
Bibkey:
Cite (ACL):: Jianyou Wang, Weili Cao, Longtian Bao, Youze Zheng, Gil Pasternak, Kaicheng Wang, Xiaoyue Wang, Ramamohan Paturi, and Leon Bergen. 2025. Measuring Risk of Bias in Biomedical Reports: The RoBBR Benchmark. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 3220–3248, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Measuring Risk of Bias in Biomedical Reports: The RoBBR Benchmark (Wang et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.160.pdf
Checklist:: 2025.emnlp-main.160.checklist.pdf

PDF Cite Search Checklist Fix data