BioConflict: A Benchmark for Evaluating Large Language Models in Biomedical Contradiction Detection and Consensus Synthesis

Ashwin Kirubakaran; Henry Gagnier

BioConflict: A Benchmark for Evaluating Large Language Models in Biomedical Contradiction Detection and Consensus Synthesis

Abstract

Resolving contradictions in biomedical literature requires more than factual recall; it demands identifying the hidden variables that explain divergent findings. Existing NLI benchmarks such as MedNLI operate at the sentence level and fail to capture document-level conflicts driven by differences in dosage, cell type, or study design. We introduce BioConflict, a benchmark of 250 expert-annotated paper pairs (500 abstracts) across ten biomedical topics, formalizing three tasks: conflict detection, contextual variable extraction, and consensus synthesis. We evaluate five general-purpose large language models and two domain-specific baselines, finding that general-purpose large language models achieve strong conflict detection (F1 up to 0.89) but exhibit brittle reasoning in synthesis, while domain-specific models lag significantly on all generative tasks. These findings highlight the need for context-aware biomedical AI capable of resolving, not merely retrieving, conflicting scientific evidence.

Anthology ID:: 2026.bionlp-1.44
Volume:: BioNLP 2026
Month:: July
Year:: 2026
Address:: San Diego, California
Editors:: Dina Demner-Fushman, Sophia Ananiadou, Kirk Roberts, Junichi Tsujii
Venues:: BioNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 552–558
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.44/
DOI:
Bibkey:
Cite (ACL):: Ashwin Kirubakaran and Henry Gagnier. 2026. BioConflict: A Benchmark for Evaluating Large Language Models in Biomedical Contradiction Detection and Consensus Synthesis. In BioNLP 2026, pages 552–558, San Diego, California. Association for Computational Linguistics.
Cite (Informal):: BioConflict: A Benchmark for Evaluating Large Language Models in Biomedical Contradiction Detection and Consensus Synthesis (Kirubakaran & Gagnier, BioNLP 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.44.pdf

PDF Cite Search Fix data