Chi Chen
Other people with similar names: Chi Chen
Unverified author pages with similar names: Chi Chen
2026
EchoMLLM: Incentivizing Echocardiographic Video Understanding with Keyframe Grounding and Report Generation
Heyu Huang | Wanran Sun | Chi Chen | Bo Chen | Zonghao Guo | Yuhua Li | Ruixuan Li | Kunlun He | Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2026
Heyu Huang | Wanran Sun | Chi Chen | Bo Chen | Zonghao Guo | Yuhua Li | Ruixuan Li | Kunlun He | Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2026
Echocardiography analysis demands a dual capability: rigorous quantitative keyframe localization for evidence verification and comprehensive qualitative synthesis for diagnostic reporting. However, current Multi-Modal Large Language Models (MLLMs) struggle to meet these clinical requirements due to a misalignment with diagnostic workflows, a scarcity of video instruction data, and the critical challenge of cyclic temporal ambiguity—where the repetitive nature of cardiac cycles renders standard single-frame supervision ill-posed. To bridge this gap, we introduce EchoMLLM, a unified framework designed for real-world echocardiography video understanding. First, we align model capabilities with clinical needs by defining two fine-grained tasks: cycle- and pathology-conditioned keyframe grounding and video report generation. To facilitate this, we curate EchoMM-120k, a large-scale instruction dataset specifically constructed to support temporal localization and professional reporting. Furthermore, to resolve the cyclic ambiguity, we propose a multi-stage training paradigm incorporating a novel cycle-aware Reinforcement Learning (RL) strategy. By prioritizing logical consistency over rigid index matching, our approach moves beyond rote memorization to elicit invariant reasoning. Extensive experiments demonstrate that EchoMLLM reduces temporal grounding errors by up to 76% and improves report generation quality by 65% over its backbone, achieving state-of-the-art performance against both generalist and medical baselines.
MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding
Fuwen Luo | Shengfeng Lou | Chi Chen | Ziyue Wang | Chenliang Li | Weizhou Shen | Jiyue Guo | Peng Li | Ming Yan | Ji Zhang | Fei Huang | Yang Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Fuwen Luo | Shengfeng Lou | Chi Chen | Ziyue Wang | Chenliang Li | Weizhou Shen | Jiyue Guo | Peng Li | Ming Yan | Ji Zhang | Fei Huang | Yang Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Video temporal understanding is crucial for multimodal large language models (MLLMs) to reason over events in videos. Despite recent advances in general video understanding, current MLLMs still struggle with fine-grained temporal reasoning. While reinforcement learning (RL) has been explored to address this issue recently, existing RL approaches remain limited in performance on time-sensitive tasks. In this work, we propose **MUSEG**, a novel RL-based method that enhances temporal understanding by introducing timestamp-aware multi-segment grounding. MUSEG enables MLLMs to align queries with multiple relevant video segments, promoting more comprehensive temporal reasoning. To facilitate effective learning, we design a customized RL training recipe with phased rewards that progressively guides the model toward temporally grounded reasoning. Extensive experiments on temporal grounding and time-sensitive video question answering (QA) tasks demonstrate that MUSEG significantly outperforms existing methods and generalizes well across diverse temporal understanding scenarios.
RSMeM: Knowledge-Enhanced Memory Evolution for Remote Sensing Agents with Systematic Evaluation
Bingxian Wu | Yu Zhang | Zonghao Guo | Tang Liu | Chen Qian | Yuxiang Lu | Xingbo Du | Yanghao Li | Yidan Zhang | Chi Chen | Ling Yao | Chenghu Zhou | Maosong Sun
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Bingxian Wu | Yu Zhang | Zonghao Guo | Tang Liu | Chen Qian | Yuxiang Lu | Xingbo Du | Yanghao Li | Yidan Zhang | Chi Chen | Ling Yao | Chenghu Zhou | Maosong Sun
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Geoscience research requires complex analysis and domain expertise, with remote sensing (RS) observations as a key foundation. However, existing RS agents built on general-purpose LLMs remain largely domain-agnostic, resulting in brittle and error-prone workflows. Moreover, these failures are seldom consolidated into a reusable experience for subsequent analyses. To address this issue, we introduce RSMeM, a knowledge-enhanced memory evolution mechanism that bootstraps RS agents with pre-distilled domain knowledge and iteratively integrates online experience for robust multi-step tool execution. RSMeM is composed of two components: (i) Hierarchical Knowledge Grounding, which performs taxonomy-aware retrieval over a hierarchical domain corpus to guide planning and tool selection; and (ii) Failure-Aware Experience Refinement, which distills failure-annotated tool-use traces into reusable constraints for next-round tool execution. By iteratively employing these two processes, RS agents can evolve to absorb task-level domain knowledge and effectively translate it into instance-level execution experience. Extensive experiments on EarthBench demonstrate that RSMeM consistently improves tool-use performance and end-to-end accuracy across a diverse set of LLM backbones. Notably, RSMeM achieves a 6% accuracy improvement on DeepSeek-V3.2 with less than 1% additional experience tokens, demonstrating the knowledge density of our distilled experience. All codes and models will be released to support reproducible research.
2025
Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model
Xinyue Lou | You Li | Jinan Xu | Xiangyu Shi | Chi Chen | Kaiyu Huang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Xinyue Lou | You Li | Jinan Xu | Xiangyu Shi | Chi Chen | Kaiyu Huang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
The rapid development of Multimodal Large Reasoning Models (MLRMs) has demonstrated broad application potential, yet their safety and reliability remain critical concerns that require systematic exploration. To address this gap, we conduct a comprehensive and systematic safety evaluation of 13 MLRMs across 5 benchmarks and unveil prevalent safety degradation phenomena in most advanced models. Moreover, our analysis reveals distinct safety patterns across different benchmarks: significant safety degradation is observed across jailbreak robustness benchmarks, whereas safety-awareness benchmarks demonstrate less pronounced degradation. In particular, the long thought process in some scenarios even enhances safety performance. Therefore, it is a potential approach to address safety issues in MLRMs by leveraging the intrinsic reasoning capabilities of the model to detect unsafe intent. To operationalize this insight, we construct a multimodal tuning dataset that incorporates a safety-oriented thought process. Experimental results from fine-tuning existing MLRMs with this dataset effectively enhance the safety on both jailbreak robustness and safety-awareness benchmarks. This study provides a new perspective for developing safe MLRMs.
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
Zhong Zhang | Yaxi Lu | Yikun Fu | Yupeng Huo | Shenzhi Yang | Yesai Wu | Han Si | Xin Cong | Haotian Chen | Yankai Lin | Jie Xie | Wei Zhou | Wang Xu | Yuanheng Zhang | Zhou Su | Zhongwu Zhai | Xiaoming Liu | Yudong Mei | Jianming Xu | Hongyan Tian | Chongyi Wang | Chi Chen | Yuan Yao | Zhiyuan Liu | Maosong Sun
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Zhong Zhang | Yaxi Lu | Yikun Fu | Yupeng Huo | Shenzhi Yang | Yesai Wu | Han Si | Xin Cong | Haotian Chen | Yankai Lin | Jie Xie | Wei Zhou | Wang Xu | Yuanheng Zhang | Zhou Su | Zhongwu Zhai | Xiaoming Liu | Yudong Mei | Jianming Xu | Hongyan Tian | Chongyi Wang | Chi Chen | Yuan Yao | Zhiyuan Liu | Maosong Sun
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Large language model agents have enabled GUI-based automation, particularly for mobile devices. However, deployment remains limited by noisy data, poor generalization, and lack of support for non-English GUIs. In this work, we present AgentCPM-GUI, an 8B-parameter GUI agent built for robust and efficient on-device GUI interaction. Our training pipeline includes grounding-aware pre-training to enhance perception, supervised fine-tuning on high-quality Chinese and English trajectories to imitate human-like actions, and reinforcement fine-tuning with GRPO to improve reasoning capability. AgentCPM-GUI achieves promising performance on five public benchmarks and our proposed Chinese benchmark CAGUI. To facilitate reproducibility and further research, we publicly release all code, model checkpoint, and evaluation data at: https://github.com/OpenBMB/AgentCPM-GUI
Search
Fix author
Co-authors
- Maosong Sun (孙茂松) 3
- Zonghao Guo 2
- Bo Chen 1
- Haotian Chen 1
- Xin Cong 1
- Xingbo Du 1
- Yikun Fu 1
- Jiyue Guo 1
- Kunlun He 1
- Fei Huang 1
- Heyu Huang 1
- Kaiyu Huang (黄锴宇) 1
- Yupeng Huo 1
- Chenliang Li 1
- Peng Li 1
- Ruixuan Li 1
- Yanghao Li 1
- You Li (李铀) 1
- Yuhua Li 1
- Yankai Lin (林衍凯) 1
- Tang Liu 1
- Xiaoming Liu 1
- Yang Liu 1
- Zhiyuan Liu 1
- Shengfeng Lou 1
- Xinyue Lou (娄馨月) 1
- Yaxi Lu 1
- Yuxiang Lu 1
- Fuwen Luo 1
- Yudong Mei 1
- Chen Qian 1
- Weizhou Shen 1
- Xiangyu Shi (石响宇) 1
- Han Si 1
- Zhou Su 1
- Wanran Sun 1
- Hongyan Tian 1
- Chongyi Wang 1
- Ziyue Wang 1
- Bingxian Wu 1
- Yesai Wu 1
- Jie Xie 1
- Jianming Xu 1
- Jinan Xu (徐金安) 1
- Wang Xu 1
- Ming Yan 1
- Shenzhi Yang 1
- Ling Yao 1
- Yuan Yao 1
- Zhongwu Zhai 1
- Ji Zhang 1
- Yidan Zhang 1
- Yu Zhang 1
- Yuanheng Zhang 1
- Zhong Zhang 1
- Chenghu Zhou 1
- Wei Zhou 1