Junjun Wang
2025
Winning ClimateCheck: A Multi-Stage System with BM25, BGE-Reranker Ensembles, and LLM-based Analysis for Scientific Abstract Retrieval
Junjun Wang
|
Kunlong Chen
|
Zhaoqun Chen
|
Peng He
|
Wenlu Zheng
Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)
The ClimateCheck shared task addresses the critical challenge of grounding social media claims about climate change in scientific literature. This paper details our winning approach. For abstract retrieval, we propose a multi-stage pipeline: (1) initial candidate generation from a corpus of ~400,000 abstracts using BM25; (2) fine-grained reranking of these candidates using an ensemble of BGE-Reranker cross-encoder models, fine-tuned with a specialized training set incorporating both random and hard negative samples; and (3) final list selection based on an RRF-ensembled score. For the verification aspect, we leverage Gemini 2.5 Pro to classify the relationship (Supports, Refutes, Not Enough Information) between claims and the retrieved abstracts, guided by carefully engineered prompts. Our system achieved first place in both subtasks, demonstrating the efficacy of combining robust sparse retrieval, powerful neural rerankers, strategic negative sampling, and LLM-based semantic analysis for connecting social media discourse to scientific evidence. Part of the example code: https://anonymous.4open.science/r/climatecheck_solution-1120