Xiaopeng Li
Other people with similar names: Xiaopeng Li
2025
Bridging Relevance and Reasoning: Rationale Distillation in Retrieval-Augmented Generation
Pengyue Jia
|
Derong Xu
|
Xiaopeng Li
|
Zhaocheng Du
|
Xiangyang Li
|
Yichao Wang
|
Yuhao Wang
|
Qidong Liu
|
Maolin Wang
|
Huifeng Guo
|
Ruiming Tang
|
Xiangyu Zhao
Findings of the Association for Computational Linguistics: ACL 2025
The reranker and generator are two critical components in the Retrieval-Augmented Generation (i.e., RAG) pipeline, responsible for ranking relevant documents and generating responses. However, due to differences in pre-training data and objectives, there is an inevitable gap between the documents ranked as relevant by the reranker and those required by the generator to support answering the query. To address this gap, we propose RADIO, a novel and practical preference alignment framework with RAtionale DIstillatiOn. Specifically, We first propose a rationale extraction method that leverages the reasoning capabilities of large language models (LLMs) to extract the rationales necessary for answering the query. Subsequently, a rationale-based alignment process is designed to rerank the documents based on the extracted rationales, and fine-tune the reranker to align the preferences. We conduct extensive experiments on two tasks across three datasets to demonstrate the effectiveness of our approach compared to baseline methods. Our code is released online to ease reproduction.
Humanity’s Last Code Exam: Can Advanced LLMs Conquer Human’s Hardest Code Competition?
Xiangyang Li
|
Xiaopeng Li
|
Kuicai Dong
|
Zhangquanhu
|
Rongju Ruan
|
Xinyi Dai
|
Yasheng Wang
|
Ruiming Tang
Findings of the Association for Computational Linguistics: EMNLP 2025
Code generation is a core capability of large language models (LLMs), yet mainstream benchmarks (e.g., APPs and LiveCodeBench) contain questions with medium-level difficulty and pose no challenge to advanced LLMs. To better reflected the advanced reasoning and code generation ability, We introduce Humanity’s Last Code Exam (HLCE), comprising 235 most challenging problems from the International Collegiate Programming Contest (ICPC World Finals) and the International Olympiad in Informatics (IOI) spanning 2010 – 2024. As part of HLCE, we design a harmonized online–offline sandbox that guarantees fully reproducible evaluation. Through our comprehensive evaluation, we observe that even the strongest reasoning LLMs: o4-mini(high) and Gemini-2.5 Pro, achieve pass@1 rates of only 15.9% and 11.4%, respectively. Meanwhile, we propose a novel “self-recognition” task to measure LLMs’ awareness of their own capabilities. Results indicate that LLMs’ self-recognition abilities are not proportionally correlated with their code generation performance. Finally, our empirical validation of test-time scaling laws reveals that current advanced LLMs have substantial room for improvement on complex programming tasks. We expect HLCE to become a milestone challenge for code generation and to catalyze advances in high-performance reasoning and human–AI collaborative programming. Our code and dataset are also public available¹.https://github.com/Humanity-s-Last-Code-Exam/HLCE
Search
Fix author
Co-authors
- Xiangyang Li 2
- Ruiming Tang 2
- Xinyi Dai 1
- Kuicai Dong 1
- Zhaocheng Du 1
- show all...