Weiwei LI
Also published as: Weiwei Li
2026
Mixture-of-Minds: Multi-Agent Reinforcement Learning for Table Understanding
Yuhang Zhou | Mingrui Zhang | Ke Li | Mingyi Wang | Qiao Liu | Qifei Wang | Jiayi Liu | Fei Liu | Serena Li | Weiwei LI | Mingze Gao | Abhishek Kumar | Xiangjun Fan | Zhuokai Zhao | Lizhu Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yuhang Zhou | Mingrui Zhang | Ke Li | Mingyi Wang | Qiao Liu | Qifei Wang | Jiayi Liu | Fei Liu | Serena Li | Weiwei LI | Mingze Gao | Abhishek Kumar | Xiangjun Fan | Zhuokai Zhao | Lizhu Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Understanding and reasoning over tables is a critical capability for many real-world applications. Large language models (LLMs) have shown promise on this task, but current approaches remain limited. Fine-tuning based methods strengthen language reasoning; yet they are prone to arithmetic errors and hallucination. In contrast, tool-based methods enable precise table manipulation but rely on rigid schemas and lack semantic understanding. These complementary drawbacks highlight the need for approaches that integrate robust reasoning with reliable table processing. In this work, we propose MIXTURE-OF-MINDS, a multi-agent framework that decomposes table reasoning into three specialized roles: planning, coding, and answering. This design enables each agent to focus on a specific aspect of the task while leveraging code execution for precise table manipulation. Building on this workflow, we introduce a self-improvement training framework that employs Monte Carlo Tree Search (MCTS) rollouts to generate pseudo-gold trajectories and optimize agents with reinforcement learning (RL). Extensive experiments show that MIXTURE-OF-MINDS delivers substantial gains, reaching 62.13% on TableBench and surpassing GPT-o3-mini. These results demonstrate the promise of combining structured multi-agent workflows with RL to advance table understanding.
2024
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
Junzhuo Liu | Xuzheng Yang | Weiwei Li | Peng Wang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Junzhuo Liu | Xuzheng Yang | Weiwei Li | Peng Wang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Referring Expression Comprehension (REC) is a crucial cross-modal task that objectively evaluates the capabilities of language understanding, image comprehension, and language-to-image grounding. Consequently, it serves as an ideal testing ground for Multi-modal Large Language Models (MLLMs). In pursuit of this goal, we have established a new REC dataset characterized by two key features: Firstly, it is designed with controllable varying levels of difficulty, necessitating multi-level fine-grained reasoning across object categories, attributes, and multi-hop relationships. Secondly, it includes negative text and images created through fine-grained editing and generation based on existing data, thereby testing the model’s ability to correctly reject scenarios where the target object is not visible in the image—an essential aspect often overlooked in existing datasets and approaches. Utilizing this high-quality dataset, we conducted comprehensive evaluations of both state-of-the-art specialist models and MLLMs. Our findings indicate that there remains a significant gap in achieving satisfactory grounding performance. We anticipate that our dataset will inspire new approaches to enhance visual reasoning and develop more advanced cross-modal interaction strategies, ultimately unlocking the full potential of MLLMs.