Chengliang Chai

2026

The most recent research uses reinforcement learning (RL) to post-train Multi-modal Large Language Models (MLLMs) such that these models are able to iteratively call search engines to dynamically access external knowledge when handling complex Visual Question Answering (VQA) tasks. However, existing methods face two major limitations in effectiveness and efficiency: i) For effectiveness, the objective of these methods, which only considers the correctness of the generated final response, overlooks the quality of intermediate search results, thus leading to suboptimal search strategies. ii) For efficiency, existing methods often unnecessarily invoke search calls during reasoning, making the inference inefficient. To address these issues, we propose , a customized dual-objective reinforcement learning framework to improve the search strategies of MLLMs, enhancing their search quality yet minimizing search frequency. The key ideas include (1) a reward function that promotes correct reasoning trajectories with fewer search calls; and (2) dual optimization objectives that jointly optimize search quality and answer correctness. Extensive experiments on 3 real-world datasets demonstrate that DORA outperforms state-of-the-art methods, achieving up to 8.4% higher accuracy while reducing the number of search calls by 9.7%.

2025

pdf bib abs

CCL25-Eval任务四系统报告:宏观模式提示与高效微调在叙实性推理中的应用
Zequn Li | Yuanhao Zhong | Chengliang Chai
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)

"本文研究了利用大语言模型进行谓词引导的叙实性推理任务。在不微调场景下,针对Gemini 2.5 Pro模型,我们构建了基于谓词类型的思维链(CoT)提示,并创新性地让模型学习整个带答案的样本集以归纳宏观模式和规则,最终形成高效的提示词模板。在微调场景下,我们选用Qwen3-32b模型,利用llama factory进行LoRA微调,并使用llama.cpp完成模型向gguf格式的转换、量化及Ollama部署。实验结果展示了所提方法的有效性,其中在不微调赛道上,基于宏观模式提示的方法取得了94.01%的准确率;在微调赛道上,基于微调模型的系统取得了92.61%的准确率。"

Co-authors

Ye Yuan 1

Venues

ACL1
CCL1

Fix author