Zhu Min
2026
Progressive Re-ranking for Multimodal Retrieval-Augmented Generation via Curriculum Learning
Zhu Min | Yanchao Hao | Jian Liu | Shizhu He | Xi Chen
Findings of the Association for Computational Linguistics: ACL 2026
Zhu Min | Yanchao Hao | Jian Liu | Shizhu He | Xi Chen
Findings of the Association for Computational Linguistics: ACL 2026
Retrieval-augmented generation (RAG) can enhance large language models (LLMs) by providing external knowledge and helping reduce hallucinations. In multimodal RAG, however, retrieval remains challenging because a single retriever may fail to capture fine-grained multimodal semantics, and visually or semantically similar entities may still contain misleading information for answer generation. We propose a progressive multimodal re-ranking framework with curriculum learning to improve CLIP-based visual coarse-grained retrieval. Our framework progressively refines retrieval results through two stages: fine-grained section-level re-ranking and multimodal section reassessment. To better align re-ranking with multimodal queries, we introduce a curriculum-learning strategy that trains the model with hard negatives that are visually or semantically similar but contain misleading information. Experiments on InfoSeek and Enc-VQA show that our method achieves state-of-the-art answer accuracy and competitive retrieval performance.
2025
SEARA: An Automated Approach for Obtaining Optimal Retrievers
Zou Yuheng | Wang Yiran Yiran | Tian Yuzhu | Zhu Min | Yanhua Huang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Zou Yuheng | Wang Yiran Yiran | Tian Yuzhu | Zhu Min | Yanhua Huang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Retrieval-Augmented Generation (RAG) is a core approach for enhancing Large Language Models (LLMs), where the effectiveness of the retriever largely determines the overall response quality of RAG systems. Retrievers encompass a multitude of hyperparameters that significantly impact performance outcomes and demonstrate sensitivity to specific applications. Nevertheless, hyperparameter optimization entails prohibitively high computational expenses. Existing evaluation methods suffer from either prohibitive costs or disconnection from domain-specific scenarios. This paper proposes SEARA (Subset sampling Evaluation for Automatic Retriever Assessment), which addresses evaluation data challenges through subset sampling techniques and achieves robust automated retriever evaluation by minimal retrieval facts extraction and comprehensive retrieval metrics. Based on real user queries, this method enables fully automated retriever evaluation at low cost, thereby obtaining optimal retriever for specific business scenarios. We validate our method across classic RAG applications in rednote, including knowledge-based Q&A system and retrieval-based travel assistant, successfully obtaining scenario-specific optimal retrievers.