Xuesong Qiu

2026

Leveraging powerful planning and reasoning capabilities, Large Language Models (LLMs)-driven Multi-Agent Systems (MAS) have demonstrated remarkable scalability and generalizability across complex tasks. However, dynamically routing the optimal combination of agents and collaboration modes for a given query to balance performance and cost remains challenging. To address the limitation of prior work, which focuses on single-agent settings and overlooks collaborative structures and role assignment in MAS, we propose RouterHGC, the first heterogeneous graph contrastive learning framework for MAS routing. We formalize routing as node selection through edge-weight prediction on a heterogeneous graph whose node types include user queries, collaboration modes, agent roles, and LLMs, with message passing capturing their high-order dependencies. We further design a novel global–local contrastive loss function to jointly optimize graph-level representations and edge-level selections, pulling each query graph toward high-performing positives while pushing it away from underperforming or costly negatives. Experiments on five public datasets covering mathematical reasoning, code generation, and knowledge question answering show that RouterHGC outperforms the best single LLM and baselines, achieving 0.80%–6.17% accuracy gains on MATH and HotpotQA while reducing inference cost by 27.40%.

2025

pdf bib abs

Multimedia Event Extraction with LLM Knowledge Editing
Jiaao Yu | Yijing Lin | Zhipeng Gao | Xuesong Qiu | Lanlan Rui
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Multimodal event extraction task aims to identify event types and arguments from visual and textual representations related to events. Due to the high cost of multimedia training data, previous methods mainly focused on weakly alignment of excellent unimodal encoders. However, they ignore the conflict between event understanding and image recognition, resulting in redundant feature perception affecting the understanding of multimodal events. In this paper, we propose a multimodal event extraction strategy with a multi-level redundant feature selection mechanism, which enhances the event understanding ability of multimodal large language models by leveraging knowledge editing techniques, and requires no additional parameter optimization work. Extensive experiments show that our method outperforms the state-of-the-art (SOTA) baselines on the M2E2 benchmark. Compared with the highest baseline, we achieve a 34% improvement of precision on event extraction and a 11% improvement of F1 on argument extraction.

Co-authors

Venues

EMNLP1
Findings1

Fix author