Xiangjun Fan
2026
TARo: Token-level Adaptive Routing for LLM Test-time Alignment
Arushi Rai | Qiang Zhang | Hanqing Zeng | Yunkai Zhang | Dipesh Tamboli | Xiangjun Fan | Zhuokai Zhao | Lizhu Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Arushi Rai | Qiang Zhang | Hanqing Zeng | Yunkai Zhang | Dipesh Tamboli | Xiangjun Fan | Zhuokai Zhao | Lizhu Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Large language models (LLMs) exhibit strong reasoning capabilities but typically require expensive post-training to reach high performance. Recent test-time alignment methods offer a lightweight alternative, but have been explored mainly for preference alignment rather than reasoning. To bridge this gap, we propose Token-level Adaptive Routing (TARo), which steers frozen LLMs toward structured reasoning entirely at inference time. Specifically, we first train reward models on step-wise mathematical traces to capture fine-grained logical consistency signals, then introduce a learnable token-level router that automatically controls the guidance of the reward model to the base model. Extensive experiments show that TARo significantly improves reasoning performance by up to +22.4% over base model and +8.4% over existing token-level test-time alignment methods, while also boosting out-of-distribution clinical reasoning (MedXpertQA) and instruction following (AlpacaEval). Furthermore, TARo also generalizes from small to large backbones without retraining, extending test-time alignment from preference optimization to robust, cross-domain reasoning.
Mixture-of-Minds: Multi-Agent Reinforcement Learning for Table Understanding
Yuhang Zhou | Mingrui Zhang | Ke Li | Mingyi Wang | Qiao Liu | Qifei Wang | Jiayi Liu | Fei Liu | Serena Li | Weiwei LI | Mingze Gao | Abhishek Kumar | Xiangjun Fan | Zhuokai Zhao | Lizhu Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yuhang Zhou | Mingrui Zhang | Ke Li | Mingyi Wang | Qiao Liu | Qifei Wang | Jiayi Liu | Fei Liu | Serena Li | Weiwei LI | Mingze Gao | Abhishek Kumar | Xiangjun Fan | Zhuokai Zhao | Lizhu Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Understanding and reasoning over tables is a critical capability for many real-world applications. Large language models (LLMs) have shown promise on this task, but current approaches remain limited. Fine-tuning based methods strengthen language reasoning; yet they are prone to arithmetic errors and hallucination. In contrast, tool-based methods enable precise table manipulation but rely on rigid schemas and lack semantic understanding. These complementary drawbacks highlight the need for approaches that integrate robust reasoning with reliable table processing. In this work, we propose MIXTURE-OF-MINDS, a multi-agent framework that decomposes table reasoning into three specialized roles: planning, coding, and answering. This design enables each agent to focus on a specific aspect of the task while leveraging code execution for precise table manipulation. Building on this workflow, we introduce a self-improvement training framework that employs Monte Carlo Tree Search (MCTS) rollouts to generate pseudo-gold trajectories and optimize agents with reinforcement learning (RL). Extensive experiments show that MIXTURE-OF-MINDS delivers substantial gains, reaching 62.13% on TableBench and surpassing GPT-o3-mini. These results demonstrate the promise of combining structured multi-agent workflows with RL to advance table understanding.