Gil Pasternak
2025
Measuring Risk of Bias in Biomedical Reports: The RoBBR Benchmark
Jianyou Wang
|
Weili Cao
|
Longtian Bao
|
Youze Zheng
|
Gil Pasternak
|
Kaicheng Wang
|
Xiaoyue Wang
|
Ramamohan Paturi
|
Leon Bergen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Systems that answer questions by reviewing the scientific literature are becoming increasingly feasible. To draw reliable conclusions, these systems should take into account the quality of available evidence from different studies, placing more weight on studies that use a valid methodology. We present a benchmark for measuring the methodological strength of biomedical papers, drawing on the risk-of-bias framework used for systematic reviews. Derived from over 500 biomedical studies, the three benchmark tasks encompass expert reviewers’ judgments of studies’ research methodologies, including the assessments of risk of bias within these studies. The benchmark contains a human-validated annotation pipeline for fine-grained alignment of reviewers’ judgments with research paper sentences. Our analyses show that large language models’ reasoning and retrieval capabilities impact their effectiveness with risk-of-bias assessment. The dataset is available at https://github.com/RoBBR-Benchmark/RoBBR.
GLiNER2: Schema-Driven Multi-Task Learning for Structured Information Extraction
Urchade Zaratiana
|
Gil Pasternak
|
Oliver Boyd
|
George Hurn-Maloney
|
Ash Lewis
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Information extraction (IE) is fundamental to numerous NLP applications, yet existing solutions often require specialized models for different tasks or rely on computationally expensive large language models. We present GLiNER2, a unified framework that enhances the original GLiNER architecture to support named entity recognition, text classification, and hierarchical structured data extraction within a single efficient model. Built on a fine-tuned encoder architecture, GLiNER2 maintains CPU efficiency and compact size while introducing multi-task composition through an intuitive schema-based interface. Our experiments demonstrate competitive performance across diverse IE tasks with substantial improvements in deployment accessibility compared to LLM-based alternatives. We release GLiNER2 as an open-source library available through pip, complete with pre-trained models and comprehensive documentation.
Search
Fix author
Co-authors
- Longtian Bao 1
- Leon Bergen 1
- Oliver Boyd 1
- Weili Cao 1
- George Hurn-Maloney 1
- show all...