Knowledge-enhanced Multimodal ECG Representation Learning with Arbitrary-Lead Inputs

Che Liu; Cheng Ouyang; Zhongwei Wan; Haozhe Wang; Wenjia Bai; Rossella Arcucci

doi:10.18653/v1/2025.findings-emnlp.385

Knowledge-enhanced Multimodal ECG Representation Learning with Arbitrary-Lead Inputs

Che Liu, Cheng Ouyang, Zhongwei Wan, Haozhe Wang, Wenjia Bai, Rossella Arcucci

Abstract

Recent advancements in multimodal representation learning for electrocardiogram (ECG) have moved onto learning representations by aligning ECG signals with their paired free-text reports. However, current methods often result in suboptimal alignment of ECG signals with their corresponding text reports, thereby limiting diagnostic accuracy. This is primarily due to the complexity and unstructured nature of medical language, which makes it challenging to effectively align ECG signals with the corresponding text reports. Additionally, these methods are unable to handle arbitrary combinations of ECG leads as inputs, which poses a challenge since 12-lead ECGs may not always be available in under-resourced clinical environments.In this work, we propose the **Knowledge-enhanced Multimodal ECG Representation Learning (K-MERL)** framework to address these challenges. K-MERL leverages large language models (LLMs) to extract structured knowledge from free-text reports, enhancing the effectiveness of ECG multimodal learning. Furthermore, we design a lead-aware ECG encoder to capture lead-specific spatial-temporal characteristics of 12-lead ECGs, with dynamic lead masking. This novel encoder allows our framework to handle arbitrary lead inputs, rather than being limited to a fixed set of full 12 leads, which existing methods necessitate.We evaluate K-MERL on six external ECG datasets and demonstrate its superior capability. K-MERL not only outperforms all existing methods in zero-shot classification and linear probing tasks using 12 leads, but also achieves state-of-the-art (SOTA) results in partial-lead settings, with an average improvement of **16%** in AUC score on zero-shot classification compared to previous SOTA multimodal methods. All data and code will be released upon acceptance.

Anthology ID:: 2025.findings-emnlp.385
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7298–7316
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.385/
DOI:: 10.18653/v1/2025.findings-emnlp.385
Bibkey:
Cite (ACL):: Che Liu, Cheng Ouyang, Zhongwei Wan, Haozhe Wang, Wenjia Bai, and Rossella Arcucci. 2025. Knowledge-enhanced Multimodal ECG Representation Learning with Arbitrary-Lead Inputs. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 7298–7316, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Knowledge-enhanced Multimodal ECG Representation Learning with Arbitrary-Lead Inputs (Liu et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.385.pdf
Checklist:: 2025.findings-emnlp.385.checklist.pdf

PDF Cite Search Checklist Fix data