Qichao Zhang
2026
Spec-o3: A Tool-Augmented Vision-Language Agent for Rare Celestial Object Candidate Vetting via Automated Spectral Inspection
Minghui Jia | Qichao Zhang | Ali Luo | Linjing Li | Shuo Ye | Hailing Lu | Wen Hou | Dongbin Zhao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Minghui Jia | Qichao Zhang | Ali Luo | Linjing Li | Shuo Ye | Hailing Lu | Wen Hou | Dongbin Zhao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Due to the limited generalization and interpretability of deep learning classifiers, the final vetting of rare celestial object candidates still relies on manually intensive expert visual inspection, which has become a primary bottleneck as modern spectroscopic surveys continue to scale.To bridge this gap, we propose Spec-o3, a tool-augmented vision-language agent that performs astronomer-aligned spectral inspection via interleaved multimodal chain-of-thought reasoning.Spec-o3 is trained with a two-stage post-training recipe: cold-start supervised fine-tuning on expert inspection trajectories followed by outcome-based reinforcement learning on rare-type verification tasks.Evaluated on five rare-object identification tasks from LAMOST, Spec-o3 establishes a new State-of-the-Art, boosting the macro-F1 score from 28.3 to 76.5 with a 7B parameter base model and outperforming both proprietary VLMs and specialized deep models. Beyond accuracy, Spec-o3 processes spectra at ∼0.2 s per sample on an 8×H100 server, a ∼50× throughput gain over expert manual inspection. The agent also demonstrates strong generalization to unseen inspection tasks across survey shifts (from LAMOST to SDSS/DESI). Expert evaluations further confirm that its reasoning traces are coherent and physically consistent, supporting transparent and trustworthy decision-making.Code, data, and models are available at Project HomePage.
Beyond Query Memorization: Large Language Model Routing with Query Decomposition and Historical Matching
Bo Lv | Jingbo Sun | Jianwei Lv | Chen Tang | Shaojie Zhang | Nayu Liu | Guoxin Yu | Zihao Li | Qichao Zhang | Dongbin Zhao | Ping Luo | Yue Yu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Bo Lv | Jingbo Sun | Jianwei Lv | Chen Tang | Shaojie Zhang | Nayu Liu | Guoxin Yu | Zihao Li | Qichao Zhang | Dongbin Zhao | Ping Luo | Yue Yu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Optimizing the trade-off among predictive performance and computational cost is a central focus in the deployment of Large Language Models (LLMs). Current routing methods primarily rely on direct mapping from queries to models based on surface-level features, making them susceptible to the memorization trap and leading to poor generalizability on out-of-distribution (OOD) data. In this paper, we propose DecoR, a novel routing framework that recasts the routing task as a matching process of sifting similar queries from historical logs, effectively mitigating the memorization trap. To enhance matching accuracy, we introduce a query capability deconstruction method that decouples linguistic surface forms from task-intrinsic requirements, directing matching toward capability dimensions to ground decisions in essential task attributes. Furthermore, we develop CodaSet, a comprehensive benchmark for assessing routing generalization, where experimental results demonstrate that DecoR maintains superior accuracy while substantially lowering inference costs across both in-distribution and OOD settings.
2025
RLAE: Reinforcement Learning-Assisted Ensemble for LLMs
Yuqian Fu | Yuanheng Zhu | Jiajun Chai | Guojun Yin | Wei Lin | Qichao Zhang | Dongbin Zhao
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yuqian Fu | Yuanheng Zhu | Jiajun Chai | Guojun Yin | Wei Lin | Qichao Zhang | Dongbin Zhao
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Ensembling large language models (LLMs) can effectively combine diverse strengths of different models, offering a promising approach to enhance performance across various tasks. However, existing methods typically rely on fixed weighting strategies that fail to adapt to the dynamic, context-dependent characteristics of LLM capabilities. In this work, we propose **R**einforcement **L**earning-**A**ssisted **E**nsemble for LLMs (RLAE), a novel framework that reformulates LLM ensemble through the lens of a Markov Decision Process (MDP). Our approach introduces a RL agent that dynamically adjusts ensemble weights by considering both input context and intermediate generation states, with the agent being trained using rewards that directly correspond to the quality of final outputs. We implement RLAE using both single-agent and multi-agent reinforcement learning algorithms (RLAE_PPO and RLAE_MAPPO ), demonstrating substantial improvements over conventional ensemble methods. Extensive evaluations on a diverse set of tasks show that RLAE outperforms existing approaches by up to 3.3\\% accuracy points, offering a more effective framework for LLM ensembling. Furthermore, our method exhibits superior generalization capabilities across different tasks without the need for retraining, while simultaneously achieving lower time latency. The source code is available at here.