Yonghe Lu (路永和) - ACL Anthology

Yonghe Lu

Also published as: 永和路

2026

MAST: A Multi-View Alignment Strategy for Optimal Transport-Based Contrastive Clustering of Short Text
Zijian Zheng | Yonghe Lu
Findings of the Association for Computational Linguistics: ACL 2026

Short text clustering has gained significant prominence due to its ubiquity in real-world applications. Despite the recent success of contrastive clustering, existing paradigms still suffer from two critical bottlenecks: (1) conventional data augmentation provides limited semantic granularity and may introduce unintended noise; and (2) the absence of global optimization for cluster assignments often precipitates the accumulation of pseudo-label noise, thereby compromising semantic consistency. To bridge these gaps, we propose MAST, a Multi-view Alignment Strategy with Transport-based clustering. MAST constructs complementary structural views to capture multi-granularity semantic features and introduces a multi-view contrastive objective that jointly aligns original, augmented, and structure-enhanced embeddings. To mitigate representation over-smoothing, we incorporate structure-aware negative reweighting and intermediate-layer negative sampling. Furthermore, MAST employs high-confidence guided refinement and an optimal transport-based pseudo-label alignment mechanism to enforce global semantic consistency across multiple views. Extensive experiments on several benchmark datasets demonstrate that MAST consistently outperforms state-of-the-art methods, establishing a new competitive baseline for short text clustering.

2025

pdf bib abs

FNSCC: Fuzzy Neighborhood-Aware Self-Supervised Contrastive Clustering for Short Text
Zijian Zheng | Yonghe Lu | Jian Yin
Findings of the Association for Computational Linguistics: EMNLP 2025

Short texts pose significant challenges for clustering due to semantic sparsity, limited context, and fuzzy category boundaries. Although recent contrastive learning methods improve instance-level representation, they often overlook local semantic structure within the clustering head. Moreover, treating semantically similar neighbors as negatives impair cluster-level discrimination. To address these issues, we propose Fuzzy Neighborhood-Aware Self-Supervised Contrastive Clustering (FNSCC) framework. FNSCC incorporates neighborhood information at both the instance-level and cluster-level. At the instance-level, it excludes neighbors from the negative sample set to enhance inter-cluster separability. At the cluster-level, it introduces fuzzy neighborhood-aware weighting to refine soft assignment probabilities, encouraging alignment with semantically coherent clusters. Experiments on multiple benchmark short text datasets demonstrate that FNSCC consistently outperforms state-of-the-art models in accuracy and normalized mutual information. Our code is available at https://github.com/zjzone/FNSCC.

2023

pdf bib abs

融合Synonyms 词库的专利语义相似度计算研究(Patent Semantic Similarity Calculation by Fusing Synonyms Database)
Xinyu Tong (佟昕瑀) | Jialun Liao (廖佳伦) | Yonghe Lu (路永和)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“一直以来,专利相似度计算和比较等工作都由专利审查员人工进行并做出准确判断。然而,以人工方式分析和研判专利的原创性、实用性以及是否侵权等工作需要投入大量的人力物力资源且效率较低。基于此,本文将ALBERT预训练模型用于专利的文本表示,并通过引入Synonyms近义词库增强专利文本的语义表达能力,探索一种基于语义知识库和深度学习的专利文本表示模型与相似度计算方法。实验结果表明,加入Synonyms近义词库消歧后的专利文本相似性度量的实验准确率有一定的提升。”

Co-authors

Venues

Findings2
CCL1

Fix author