Ke Yuan
2026
SCOPE: Preserving Modality-Specific Cues to Mitigate Modality Laziness in Multimodal Learning
Jingfan Yang | Rui Zhang | Liang Hong | Ke Yuan
Findings of the Association for Computational Linguistics: ACL 2026
Jingfan Yang | Rui Zhang | Liang Hong | Ke Yuan
Findings of the Association for Computational Linguistics: ACL 2026
Multimodal learning aims to learn unified multimodal representations from heterogeneous modalities and supports many natural language processing tasks. However, multimodal models often exhibit modality laziness: over-relying on a dominant modality and under-exploiting complementary signals. Existing approaches typically strengthen unimodal training or rebalance modality contributions, but they may still emphasize shared semantics and overlook modality-specific cues. To address this, we propose SCOPE, a unified framework for learning complete multimodal representations, achieving Shared-and-COmplementary cue PrEservation. Firstly, SCOPE uses a mutual information-guided disentanglement module to separate shared semantics from modality-specific cues and mitigate representation collapse. Secondly, SCOPE aligns modalities by enforcing structural consistency between modality-wise semantic graphs, avoiding brittle point-wise matching. Finally, SCOPE performs balanced fusion via structure-aware diffusion attention to integrate shared and complementary cues without feature homogenization. Experiments on four benchmark datasets show that SCOPE consistently outperforms SOTA baselines, achieving up to 27.10% accuracy improvement.
2025
FusionDTI: Fine-grained Binding Discovery with Token-level Fusion for Drug-Target Interaction
Zhaohan Meng | Zaiqiao Meng | Ke Yuan | Iadh Ounis
Findings of the Association for Computational Linguistics: EMNLP 2025
Zhaohan Meng | Zaiqiao Meng | Ke Yuan | Iadh Ounis
Findings of the Association for Computational Linguistics: EMNLP 2025
Predicting drug-target interaction (DTI) is critical in the drug discovery process. Despite remarkable advances in recent DTI models through the integration of representations from diverse drug and target encoders, such models often struggle to capture the fine-grained interactions between drugs and protein, i.e. the binding of specific drug atoms (or substructures) and key amino acids of proteins, which is crucial for understanding the binding mechanisms and optimising drug design. To address this issue, this paper introduces a novel model, called FusionDTI, which uses a token-level **Fusion** module to effectively learn fine-grained information for **D**rug-**T**arget **I**nteraction. In particular, our FusionDTI model uses the SELFIES representation of drugs to mitigate sequence fragment invalidation and incorporates the structure-aware (SA) vocabulary of target proteins to address the limitation of amino acid sequences in structural information, additionally leveraging pre-trained language models extensively trained on large-scale biomedical datasets as encoders to capture the complex information of drugs and targets. Experiments on three well-known benchmark datasets show that our proposed FusionDTI model achieves the best performance in DTI prediction compared with eight existing state-of-the-art baselines. Furthermore, our case study indicates that FusionDTI could highlight the potential binding sites, enhancing the explainability of the DTI prediction.