Jian Yu

Also published as:


2026

Omni-modal Large Language Models (OLLMs) excel in diverse tasks but struggle with complex emotional reasoning, which requires integrating textual, visual, and acoustic signals. We attribute this limitation to modality collapse, where models over-rely on a dominant modality while neglecting complementary cues. To address this issue, we introduce OmniCoT, a data paradigm that interleaves guided tokens (e.g., [vision], [audio]) into reasoning traces to enforce structured evidence extraction. To further internalize the reasoning behaviors instilled by OmniCoT and facilitate adaptive modality prioritization, we propose Dynamic Modality-Entropy GRPO (DyME-GRPO), which utilizes entropy-based uncertainty estimates over Guided Tokens (GTs) to regulate modality usage, thereby mitigating collapse and informational redundancy. By applying supervised fine-tuning with OmniCoT followed by DyME-GRPO, we develop EmoOmni based on the Qwen2.5-Omni-7B backbone. Extensive experiments demonstrate that EmoOmni achieves state-of-the-art performance on multiple emotion recognition and reasoning benchmarks while preserving the general capabilities of the base model. These findings highlight the potential of our work for omni-modal reasoning across a broader range of complex tasks.
Zero-shot Relational Learning (ZRL) aims to perform knowledge graph completion when dealing with newly emerging relations without instances of them. However, existing ZRL methods typically depend on external knowledge beyond Knowledge Graphs (KGs), resulting in increased annotation costs and limited practical applicability. To address this issue, we propose a new **S**tructure-**A**ware paradigm for **ZRL**, termed **SAZRL**, that performs ZRL without relying on external knowledge. SAZRL leverages intrinsic structural patterns in KGs to bridge semantic correlations for new relations with existing ones. It constructs structure-aware conditional query graphs based on shared entities and adaptive relation updating module to generate representations for new relations based on the query graphs. We conduct extensive experiments on three real-world benchmarks, **NELL-ZS**, **Wiki-ZS** and **FB15K-ZS**, demonstrating that SAZRL consistently surpasses state-of-the-art ZRL methods, achieving up to **10.66%** improvement in **MRR** while reducing annotation costs and enhancing practical applicability. **The code and data are provided in supplementary materials.**
Large language models (LLMs) enable zero-shot and few-shot multi-label text classification via in-context learning, yet most approaches perform static inference and degrade under streaming test data due to distribution shift and long-tail labels. We study online test-time adaptation for LLM-based multi-label generation without any parameter updates, and identify two bottlenecks: (1) standard generation probabilities provide unreliable confidence because they ignore label competition at key decoding branches; (2) naive confidence-based caching overfits to frequent and easy examples, reducing label coverage and diversity. We propose SCOTTA, a structured confidence-guided online adaptation framework. SCOTTA introduces Label-set Local Likelihood Ratio (L3R), a label-level confidence measure that compares a target label against its valid competitors at critical decision positions. Using L3R as a unified signal, SCOTTA maintains an in-context exemplar cache via streaming submodular maximization, balancing label coverage, semantic diversity, and sample quality under a fixed context budget. Across four benchmarks, SCOTTA consistently improves Micro-F1 and Macro-F1 over strong LLM and non-LLM baselines, with the largest gains on long-tail labels.
Existing psychological counseling datasets often suffer from monolithic client personas, insufficient therapeutic depth, and a lack of process controllability. To address these critical limitations, we propose PsyChain, a chain-of-agents framework that evolves static counseling corpora into high-fidelity dialogues through collaborative simulation which explicitly models client personality, stage progression, safety monitoring, and expert supervision. PsyChain involves a Client Profiler that extracts life scenarios and pairs them with psychological personality archetypes to synthesize diverse profiles.To simulate the complete counseling process, five specialized agents—Process Monitor, Client Speaker, Safety Monitor, Counselor Supervisor, and Counselor Speaker—collaborate and interact autonomously at each dialogue turn to ensure therapeutic professionalism and safety.We apply this to construct PsyChainD, a Chinese dataset of 10,456 dialogues featuring systematically diverse client profiles. Extensive evaluation across client side, counselor side and overall quality shows substantial improvements. The model trained on PsyChainD achieves 61-91% win rates against domain-specific baselines in pairwise evaluation and the highest average score in human evaluation, indicating potential for real-world counseling.

2025

Recent progress in large language models (LLMs) has opened new possibilities for mental health support, yet current approaches lack realism in simulating specialized psychotherapy and fail to capture therapeutic progression over time. Narrative therapy, which helps individuals transform problematic life stories into empowering alternatives, remains underutilized due to limited access and social stigma. We address these limitations through a comprehensive framework with two core components. First, **INT** (Interactive Narrative Therapist) simulates expert narrative therapists by planning therapeutic stages, guiding reflection levels, and generating contextually appropriate responses through retrieval-augmentation. Second, **IMA** (Innovative Moment Assessment) provides a therapy-centric evaluation method that quantifies effectiveness by tracking “Innovative Moments” (IMs), critical narrative shifts in client speech signaling therapy progress. Experimental results on 260 simulated clients and 230 human participants reveal that **INT** consistently outperforms standard methods in therapeutic quality and depth. We further demonstrate the effectiveness of **INT** in synthesizing high-quality support conversations to facilitate social applications.

2024

In noisy label learning, instance selection based on small-loss criteria has been proven to be highly effective. However, in the case of noisy multi-label text classification (NMLTC), the presence of noise is not limited to the instance-level but extends to the (instance-label) pair-level.This gives rise to two main challenges.(1) The loss information at the pair-level fails to capture the variations between instances. (2) There are two types of noise at the pair-level: false positives and false negatives. Identifying false negatives from a large pool of negative pairs presents an exceedingly difficult task. To tackle these issues, we propose a novel approach called instance-label pair correction (iLaCo), which aims to address the problem of noisy pair selection and correction in NMLTC tasks.Specifically, we first introduce a holistic selection metric that identifies noisy pairs by simultaneously considering global loss information and instance-specific ranking information.Secondly, we employ a filter guided by label correlation to focus exclusively on negative pairs with label relevance. This filter significantly reduces the difficulty of identifying false negatives.Experimental analysis indicates that our framework effectively corrects noisy pairs in NMLTC datasets, leading to a significant improvement in model performance.
“自动报告生成技术在提高工作效率和节约人力资源方面具有显著潜力。大语言模型的出现使得报告流畅度与可解释性得到提升。然而,现有工作仍依赖人工,缺乏灵活性和丰富度。同时,小模型错误或冗余的输出与大模型自身的随机性会导致报告质量不稳定。本文提出大小模型协同的自动报告生成框架AutoRG,通过大模型的工具理解与规划能力减少人工干预,提升报告丰富度,并通过信息修正与报告迭代机制提高报告的稳定性。本文以自动专利报告生成为场景,从多个维度对AutoRG进行全面测试。结果表明,该框架在提高报告生成的丰富度和质量稳定性方面具有显著优势。”
Recent advancements in noisy multi-label text classification have primarily relied on the class-conditional noise (CCN) assumption, which treats each label independently undergoing label flipping to generate noisy labels. However, in real-world scenarios, noisy labels often exhibit dependencies with true labels. In this study, we validate through hypothesis testing that real-world datasets are unlikely to adhere to the CCN assumption, indicating that label noise is dependent on the labels. To address this, we introduce a label-specific denoising framework designed to counteract label-dependent noise. The framework initially presents a holistic selection metric that evaluates noisy labels by concurrently considering loss information, ranking information, and feature centroid. Subsequently, it identifies and corrects noisy labels individually for each label category in a fine-grained manner. Extensive experiments on benchmark datasets demonstrate the effectiveness of our method under both synthetic and real-world noise conditions, significantly improving performance over existing state-of-the-art models.

2022

Previous research for adapting a general neural machine translation (NMT) model into a specific domain usually neglects the diversity in translation within the same domain, which is a core problem for domain adaptation in real-world scenarios. One representative of such challenging scenarios is to deploy a translation system for a conference with a specific topic, e.g., global warming or coronavirus, where there are usually extremely less resources due to the limited schedule. To motivate wider investigation in such a scenario, we present a real-world fine-grained domain adaptation task in machine translation (FGraDA). The FGraDA dataset consists of Chinese-English translation task for four sub-domains of information technology: autonomous vehicles, AI education, real-time networks, and smart phone. Each sub-domain is equipped with a development set and test set for evaluation purposes. To be closer to reality, FGraDA does not employ any in-domain bilingual training data but provides bilingual dictionaries and wiki knowledge base, which can be easier obtained within a short time. We benchmark the fine-grained domain adaptation task and present in-depth analyses showing that there are still challenging problems to further improve the performance with heterogeneous resources.