Bo Chen
Other people with similar names: Bo Chen, Bo Chen, Bo Chen
Unverified author pages with similar names: Bo Chen
2026
EchoMLLM: Incentivizing Echocardiographic Video Understanding with Keyframe Grounding and Report Generation
Heyu Huang | Wanran Sun | Chi Chen | Bo Chen | Zonghao Guo | Yuhua Li | Ruixuan Li | Kunlun He | Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2026
Heyu Huang | Wanran Sun | Chi Chen | Bo Chen | Zonghao Guo | Yuhua Li | Ruixuan Li | Kunlun He | Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2026
Echocardiography analysis demands a dual capability: rigorous quantitative keyframe localization for evidence verification and comprehensive qualitative synthesis for diagnostic reporting. However, current Multi-Modal Large Language Models (MLLMs) struggle to meet these clinical requirements due to a misalignment with diagnostic workflows, a scarcity of video instruction data, and the critical challenge of cyclic temporal ambiguity—where the repetitive nature of cardiac cycles renders standard single-frame supervision ill-posed. To bridge this gap, we introduce EchoMLLM, a unified framework designed for real-world echocardiography video understanding. First, we align model capabilities with clinical needs by defining two fine-grained tasks: cycle- and pathology-conditioned keyframe grounding and video report generation. To facilitate this, we curate EchoMM-120k, a large-scale instruction dataset specifically constructed to support temporal localization and professional reporting. Furthermore, to resolve the cyclic ambiguity, we propose a multi-stage training paradigm incorporating a novel cycle-aware Reinforcement Learning (RL) strategy. By prioritizing logical consistency over rigid index matching, our approach moves beyond rote memorization to elicit invariant reasoning. Extensive experiments demonstrate that EchoMLLM reduces temporal grounding errors by up to 76% and improves report generation quality by 65% over its backbone, achieving state-of-the-art performance against both generalist and medical baselines.