Qi Feng


2026

Leveraging powerful planning and reasoning capabilities, Large Language Models (LLMs)-driven Multi-Agent Systems (MAS) have demonstrated remarkable scalability and generalizability across complex tasks. However, dynamically routing the optimal combination of agents and collaboration modes for a given query to balance performance and cost remains challenging. To address the limitation of prior work, which focuses on single-agent settings and overlooks collaborative structures and role assignment in MAS, we propose RouterHGC, the first heterogeneous graph contrastive learning framework for MAS routing. We formalize routing as node selection through edge-weight prediction on a heterogeneous graph whose node types include user queries, collaboration modes, agent roles, and LLMs, with message passing capturing their high-order dependencies. We further design a novel global–local contrastive loss function to jointly optimize graph-level representations and edge-level selections, pulling each query graph toward high-performing positives while pushing it away from underperforming or costly negatives. Experiments on five public datasets covering mathematical reasoning, code generation, and knowledge question answering show that RouterHGC outperforms the best single LLM and baselines, achieving 0.80%–6.17% accuracy gains on MATH and HotpotQA while reducing inference cost by 27.40%.

2025

Curriculum learning is a widely adopted training strategy in natural language processing (NLP), where models are exposed to examples organized by increasing difficulty to enhance learning efficiency and performance. However, most existing approaches rely on manually defined difficulty metrics – such as text length – which may not accurately reflect the model’s own perspective. To overcome this limitation, we present a self-adaptive curriculum learning paradigm that prioritizes fine-tuning examples based on difficulty scores predicted by pre-trained language models (PLMs) themselves. Building on these scores, we explore various training strategies that differ in the ordering of examples for the fine-tuning: from easy-to-hard, hard-to-easy, to mixed sampling. We evaluate our method on four natural language understanding (NLU) datasets covering both binary and multi-class classification tasks.Experimental results show that our approach leads to faster convergence and improved performance compared to standard random sampling.
Large Language Models (LLMs) have shown remarkable capabilities in role-playing dialogues, yet they often struggle to maintain emotionally consistent and psychologically plausible character personalities. We present MECoT (Markov Emotional Chain-of-Thought), a framework that enhances LLMs’ ability to generate authentic personality-driven dialogues through stochastic emotional transitions. Inspired by dual-process theory, MECoT combines a Markov-chain-driven emotional processor for intuitive responses with an LLM-based reasoning mechanism for rational regulation, mapped onto a 12-dimensional Emotion Circumplex Model. The framework dynamically adjusts emotional transitions using personality-weighted matrices and historical context, ensuring both emotional coherence and character consistency. We introduce the Role-playing And Personality Dialogue (RAPD) dataset, featuring diverse character interactions with fine-grained emotional annotations, along with novel metrics for evaluating emotional authenticity and personality alignment. Experimental results demonstrate MECoT’s effectiveness, achieving 93.3% emotional accuracy on RAPD and substantially outperforming existing approaches. Our analysis reveals optimal emotional granularity (12-16 categories) and validates our data-driven personality optimization approach. Code and data are available at https://anonymous.4open.science/r/MECoT

2024

In this paper, we describe our submission for the NLI4CT 2024 shared task on robust Natural Language Inference over clinical trial reports. Our system is an ensemble of nine diverse models which we aggregate via majority voting. The models use a large spectrum of different approaches ranging from a straightforward Convolutional Neural Network over fine-tuned Large Language Models to few-shot-prompted language models using chain-of-thought reasoning.Surprisingly, we find that some individual ensemble members are not only more accurate than the final ensemble model but also more robust.