Winning ClimateCheck: A Multi-Stage System with BM25, BGE-Reranker Ensembles, and LLM-based Analysis for Scientific Abstract Retrieval
Junjun Wang, Kunlong Chen, Zhaoqun Chen, Peng He, Wenlu Zheng
Abstract
The ClimateCheck shared task addresses the critical challenge of grounding social media claims about climate change in scientific literature. This paper details our winning approach. For abstract retrieval, we propose a multi-stage pipeline: (1) initial candidate generation from a corpus of ~400,000 abstracts using BM25; (2) fine-grained reranking of these candidates using an ensemble of BGE-Reranker cross-encoder models, fine-tuned with a specialized training set incorporating both random and hard negative samples; and (3) final list selection based on an RRF-ensembled score. For the verification aspect, we leverage Gemini 2.5 Pro to classify the relationship (Supports, Refutes, Not Enough Information) between claims and the retrieved abstracts, guided by carefully engineered prompts. Our system achieved first place in both subtasks, demonstrating the efficacy of combining robust sparse retrieval, powerful neural rerankers, strategic negative sampling, and LLM-based semantic analysis for connecting social media discourse to scientific evidence. Part of the example code: https://anonymous.4open.science/r/climatecheck_solution-1120- Anthology ID:
- 2025.sdp-1.25
- Volume:
- Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Tirthankar Ghosal, Philipp Mayr, Amanpreet Singh, Aakanksha Naik, Georg Rehm, Dayne Freitag, Dan Li, Sonja Schimmler, Anita De Waard
- Venues:
- sdp | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 276–280
- Language:
- URL:
- https://preview.aclanthology.org/display_plenaries/2025.sdp-1.25/
- DOI:
- Cite (ACL):
- Junjun Wang, Kunlong Chen, Zhaoqun Chen, Peng He, and Wenlu Zheng. 2025. Winning ClimateCheck: A Multi-Stage System with BM25, BGE-Reranker Ensembles, and LLM-based Analysis for Scientific Abstract Retrieval. In Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025), pages 276–280, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Winning ClimateCheck: A Multi-Stage System with BM25, BGE-Reranker Ensembles, and LLM-based Analysis for Scientific Abstract Retrieval (Wang et al., sdp 2025)
- PDF:
- https://preview.aclanthology.org/display_plenaries/2025.sdp-1.25.pdf