Qinghai Miao
2025
Rank-Awareness and Angular Constraints: A New Perspective on Learning Sentence Embeddings from NLI Data
Zicheng Zhou
|
Min Huang
|
Qinghai Miao
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Learning high-quality sentence embeddings from Natural Language Inference (NLI) data is often challenged by a critical signal conflict between discrete labels and the continuous spectrum of semantic similarity, as well as information loss from discarded neutral sentence pairs during training. To address this, we introduce Rank-Awareness and Angular Optimization Embeddings (RAOE), a framework that leverages the full NLI dataset (Entailment, Neutral, Contradiction) augmented with pre-computed continuous similarity scores (S). RAOE employs a novel composite objective which features: (1) a Rank Margin objective that enforces rank consistency against S using an explicit margin, and (2) a Gated Angular objective that conditionally refines embedding geometry based on NLI label (L) and S score agreement. Extensive evaluations on STS tasks and the MTEB benchmark demonstrate RAOE’s effectiveness. Our general-purpose RAOE-S1 model (BERT-base) significantly outperforms strong baselines, achieving an average Spearman’s correlation of 85.11 (vs. SimCSE’s 81.57 and AnglE’s 82.43), and shows consistent improvements on MTEB. Further STS-specialized fine-tuning (RAOE-S2) establishes new state-of-the-art performance on STS (88.17 with BERT-base). These results confirm RAOE’s ability to efficiently learn robust and nuanced sentence representations through the synergy of rank-awareness and conditional angular constraints. Code is available at https://github.com/Shengjingwa/RAOE.
2024
DimA: A Parameter-efficient Fine-tuning Method with Knowledge Transfer Based on Transformer
Wenxuan Zhang
|
Min Huang
|
Zhuoyang Song
|
Qinghai Miao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Fine-tuning is a widely used technique for leveraging pre-trained language models (PLMs) in downstream tasks, but it can be computationally expensive and storage-intensive. To address this challenge, researchers have developed parameter-efficient methods that balance performance and resource cost. However, these methods often come with trade-offs like increased inference latency, token length usage, or limited adaptability for multitasking scenarios. This paper introduces a novel parameter-efficient method called DimA(Dimensionality Augmentation), which enhances the Transformer architecture by increasing the dimensionality. DimA achieves state-of-the-art results in GLUE and XSUM tasks while utilizing less than 1% of the original model’s parameters. Moreover, DimA introduces a novel approach to knowledge transfer that enables the simultaneous utilization of knowledge learned from multiple tasks to handle new tasks. This method significantly enhances the performance of the model on new tasks. Its versatility in model structure also enables its application to various Transformer-based models.