Yiming Cheng
Also published as: YiMing Cheng
2026
MTAVG-Bench: A Diagnostic Benchmark for Multi-Talker Dialogue-Centric Audio-Video Generation
Yanghao Zhou | Haitian Li | Rexar Lin | Heyan Huang | Jinxing Zhou | Changsen Yuan | Tian Lan | Ziqin Zhou | Yudong Li | Jiajun Xu | Jingyun Liao | YiMing Cheng | Xuefeng Chen | Xian-Ling Mao | Yousheng Feng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yanghao Zhou | Haitian Li | Rexar Lin | Heyan Huang | Jinxing Zhou | Changsen Yuan | Tian Lan | Ziqin Zhou | Yudong Li | Jiajun Xu | Jingyun Liao | YiMing Cheng | Xuefeng Chen | Xian-Ling Mao | Yousheng Feng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent advances in text-to-audio-video (T2AV) generation have enabled models to synthesize audio-visual videos with multi-participant dialogues. However, existing evaluation benchmarks remain largely designed for human-recorded videos or single-speaker settings. As a result, structural failures in generated multi-talker dialogue videos, such as identity drift, unnatural turn transitions, and audio-visual misalignment, cannot be effectively diagnosed. To address this issue, we introduce MTAVG-Bench, a failure-driven diagnostic benchmark for multi-talker dialogue-centric audio-video generation. MTAVG-Bench is built via a semi-automatic pipeline, where 1.8k videos are generated using mainstream T2AV models with carefully designed prompts, yielding 2.4k manually annotated QA pairs for fine-grained failure diagnosis. The benchmark evaluates multi-speaker dialogue generation at four levels: audio-visual signal fidelity, temporal attribute consistency, social interaction, and cinematic expression. Built on a hierarchical failure taxonomy and a targeted QA protocol, MTAVG-Bench is primarily designed to evaluate whether proprietary and open-source omni-models can reliably identify failure modes in multi-speaker T2AV outputs. We benchmark 12 proprietary and open-source omni-models on MTAVG-Bench, with Gemini 3 Pro achieving the strongest overall performance, while leading open-source models remain competitive in signal fidelity and consistency. Overall, MTAVG-Bench enables fine-grained failure analysis for rigorous model comparison and targeted video generation refinement.
2025
Depression Detection on Social Media with Large Language Models
Xiaochong Lan | Zhiguang Han | Yiming Cheng | Li Sheng | Jie Feng | Chen Gao | Yong Li
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Xiaochong Lan | Zhiguang Han | Yiming Cheng | Li Sheng | Jie Feng | Chen Gao | Yong Li
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Limited access to mental healthcare resources hinders timely depression diagnosis, leading to detrimental outcomes.Social media platforms present a valuable data source for early detection, yet this task faces two significant challenges: 1) the need for medical knowledge to distinguish clinical depression from transient mood changes, and 2) the dual requirement for high accuracy and model explainability.To address this, we propose DORIS, a framework that leverages Large Language Models (LLMs).To integrate medical knowledge, DORIS utilizes LLMs to annotate user texts against established medical diagnostic criteria and to summarize historical posts into temporal mood courses.These medically-informed features are then used to train an accurate Gradient Boosting Tree (GBT) classifier.Explainability is achieved by generating justifications for predictions based on the LLM-derived symptom annotations and mood course analyses.Extensive experimental results validate the effectiveness as well as interpretability of our method, highlighting its potential as a supportive clinical tool.