Emerson Liu


2025

pdf bib
Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training
Yihang Yao | Zhepeng Cen | Miao Li | William Han | Yuyou Zhang | Emerson Liu | Zuxin Liu | Chuang Gan | Ding Zhao
Findings of the Association for Computational Linguistics: ACL 2025

Large Language Models (LLMs) have demonstrated strong reasoning capabilities across various tasks. However, even minor variations in query phrasing, despite preserving the underlying semantic meaning, can significantly affect their performance. To address this, we focus on enhancing LLMs’ awareness of symmetry in query variations and propose syMmetry-ENhanceD (MEND) data augmentation, a data-centric approach that improves the model’s ability to extract useful information from context. Unlike existing methods that emphasize reasoning chain augmentation, our approach improves model robustness at the knowledge extraction stage through query augmentation, enabling more data-efficient training and stronger generalization to Out-of-Distribution (OOD) settings. Extensive experiments on both logical and arithmetic reasoning tasks show that MEND enhances reasoning performance across diverse query variations, providing new insights into improving LLM robustness through structured dataset curation.

2023

pdf bib
Transfer Knowledge from Natural Language to Electrocardiography: Can We Detect Cardiovascular Disease Through Language Models?
Jielin Qiu | William Han | Jiacheng Zhu | Mengdi Xu | Michael Rosenberg | Emerson Liu | Douglas Weber | Ding Zhao
Findings of the Association for Computational Linguistics: EACL 2023

Recent advancements in Large Language Models (LLMs) have drawn increasing attention since the learned embeddings pretrained on large-scale datasets have shown powerful ability in various downstream applications. However, whether the learned knowledge by LLMs can be transferred to clinical cardiology remains unknown. In this work, we aim to bridge this gap by transferring the knowledge of LLMs to clinical Electrocardiography (ECG). We propose an approach for cardiovascular disease diagnosis and automatic ECG diagnosis report generation. We also introduce an additional loss function by Optimal Transport (OT) to align the distribution between ECG and language embedding. The learned embeddings are evaluated on two downstream tasks: (1) automatic ECG diagnosis report generation, and (2) zero-shot cardiovascular disease detection. Our approach is able to generate high-quality cardiac diagnosis reports and also achieves competitive zero-shot classification performance even compared with supervised baselines, which proves the feasibility of transferring knowledge from LLMs to the cardiac domain.