2025
pdf
bib
abs
Contrastive Prompting Enhances Sentence Embeddings in LLMs through Inference-Time Steering
Zifeng Cheng
|
Zhonghui Wang
|
Yuchen Fu
|
Zhiwei Jiang
|
Yafeng Yin
|
Cong Wang
|
Qing Gu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Extracting sentence embeddings from large language models (LLMs) is a practical direction, as it requires neither additional data nor fine-tuning. Previous studies usually focus on prompt engineering to guide LLMs to encode the core semantic information of the sentence into the embedding of the last token. However, the last token in these methods still encodes an excess of non-essential information, such as stop words, limiting its encoding capacity. To this end, we propose a Contrastive Prompting (CP) technique that introduces an extra auxiliary prompt to elicit better sentence embedding. By contrasting with the auxiliary prompt, CP can steer existing prompts to encode the core semantics of the sentence, rather than non-essential information. CP is a plug-and-play inference-time intervention method that can be combined with various prompt-based methods. Extensive experiments on Semantic Textual Similarity (STS) tasks and downstream classification tasks demonstrate that our method can improve the performance of existing prompt-based methods across different LLMs.
pdf
bib
abs
Multi-Prompting Decoder Helps Better Language Understanding
Zifeng Cheng
|
Zhaoling Chen
|
Zhiwei Jiang
|
Yafeng Yin
|
Cong Wang
|
Shiping Ge
|
Qing Gu
Findings of the Association for Computational Linguistics: ACL 2025
Recent large Pre-trained Language Models (PLMs) usually only provide users with the inference APIs, namely the emerging Model-as-a-Service (MaaS) setting. To adapt MaaS PLMs to downstream tasks without accessing their parameters and gradients, some existing methods focus on the output-side adaptation of PLMs, viewing the PLM as an encoder and then optimizing a task-specific decoder for decoding the output hidden states and class scores of the PLM. Despite the effectiveness of these methods, they only use a single prompt to query PLMs for decoding, leading to a heavy reliance on the quality of the adopted prompt. In this paper, we propose a simple yet effective Multi-Prompting Decoder (MPD) framework for MaaS adaptation. The core idea is to query PLMs with multiple different prompts for each sample, thereby obtaining multiple output hidden states and class scores from PLMs for subsequent decoding. Such multi-prompting decoding paradigm can simultaneously mitigate reliance on the quality of a single prompt, alleviate the issue of data scarcity under the few-shot setting, and provide richer knowledge extracted from PLMs. Specifically, we propose two decoding strategies: multi-prompting decoding with optimal transport for hidden states and calibrated decoding for class scores. Extensive experiments demonstrate that our method achieves new state-of-the-art results on multiple natural language understanding datasets under the few-shot setting.
pdf
bib
abs
Review-Instruct: A Review-Driven Multi-Turn Conversations Generation Method for Large Language Models
Jiangxu Wu
|
Cong Wang
|
Tianhuang Su
|
Lin Haozhi
|
JunYang JunYang
|
Zhangchao Zhangchao
|
Binqiang Pan
|
SongpanYang SongpanYang
|
Mingpeng Mingpeng
|
Kai Shi
|
Zixian Li
Findings of the Association for Computational Linguistics: ACL 2025
The effectiveness of large language models (LLMs) in conversational AI is hindered by their reliance on single-turn supervised fine-tuning (SFT) data, which limits contextual coherence in multi-turn dialogues. Existing methods for generating multi-turn dialogue data struggle to ensure both diversity and quality in instructions. To address this, we propose Review-Instruct, a novel framework that synthesizes multi-turn conversations through an iterative “Ask-Respond-Review” process involving three agent roles: a Candidate, multiple Reviewers, and a Chairman. The framework iteratively refines instructions by incorporating Reviewer feedback, enhancing dialogue diversity and difficulty. We construct a multi-turn dataset using the Alpaca dataset and fine-tune the LLaMA2-13B model. Evaluations on MT-Bench, MMLU-Pro, and Auto-Arena demonstrate significant improvements, achieving absolute gains of 2.9% on MMLU-Pro and 2% on MT-Bench compared to prior state-of-the-art models based on LLaMA2-13B. Ablation studies confirm the critical role of the Review stage and the use of multiple Reviewers in boosting instruction diversity and difficulty. Our work highlights the potential of review-driven, multi-agent frameworks for generating high-quality conversational data at scale.
2023
pdf
bib
abs
Aggregating Multiple Heuristic Signals as Supervision for Unsupervised Automated Essay Scoring
Cong Wang
|
Zhiwei Jiang
|
Yafeng Yin
|
Zifeng Cheng
|
Shiping Ge
|
Qing Gu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Automated Essay Scoring (AES) aims to evaluate the quality score for input essays. In this work, we propose a novel unsupervised AES approach ULRA, which does not require groundtruth scores of essays for training. The core idea of our ULRA is to use multiple heuristic quality signals as the pseudo-groundtruth, and then train a neural AES model by learning from the aggregation of these quality signals. To aggregate these inconsistent quality signals into a unified supervision, we view the AES task as a ranking problem, and design a special Deep Pairwise Rank Aggregation (DPRA) loss for training. In the DPRA loss, we set a learnable confidence weight for each signal to address the conflicts among signals, and train the neural AES model in a pairwise way to disentangle the cascade effect among partial-order pairs. Experiments on eight prompts of ASPA dataset show that ULRA achieves the state-of-the-art performance compared with previous unsupervised methods in terms of both transductive and inductive settings. Further, our approach achieves comparable performance with many existing domain-adapted supervised models, showing the effectiveness of ULRA. The code is available at 
https://github.com/tenvence/ulra.
pdf
bib
abs
Adaptive Gating in Mixture-of-Experts based Language Models
Jiamin Li
|
Qiang Su
|
Yitao Yang
|
Yimin Jiang
|
Cong Wang
|
Hong Xu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Large language models have demonstrated exceptional language understanding capabilities in many NLP tasks. Sparsely activated mixture-of-experts (MoE) has emerged as a promising solution for scaling models while maintaining a constant number of computational operations. Existing MoE models adopt a fixed gating network where each token is computed by the same number of experts. This contradicts our intuition that the tokens in each sequence vary in terms of their linguistic complexity and, consequently, require different computational costs. Little is discussed in prior research on the trade-off between computation per token and model performance. This paper introduces adaptive gating in MoE, a flexible training strategy that allows tokens to be processed by a variable number of experts based on expert probability distribution. Adaptive gating preserves sparsity while improving training efficiency. We further draw upon curriculum learning to better align the order of training samples and maximize the training time savings. Extensive experiments on diverse NLP tasks show that adaptive gating reduces at most 22.5% training time while maintaining inference quality. Moreover, we conduct a comprehensive analysis of the gating decisions and present our insights on which tokens are inherently difficult to process, depending on the specific language task.
2020
pdf
bib
abs
基于BiLSTM-CRF的社会突发事件研判方法(Social Emergency Event Judgement based on BiLSTM-CRF)
Huijun Hu (胡慧君)
|
Cong Wang (王聪)
|
Jianhua Dai (代建华)
|
Maofu Liu (刘茂福)
Proceedings of the 19th Chinese National Conference on Computational Linguistics
社会突发事件的分类和等级研判作为应急处置中的一环,其重要性不言而喻。然而,目前研究多数采用人工或规则的方法识别证据进行研判,由于社会突发事件的构成的复杂性和语言描述的灵活性,这对于研判证据识别有很大局限性。本文参考“事件抽取”思想,事件类型和研判证据作为事件中元素,以BiLSTM-CRF方法细粒度的识别,并将二者结合,分类结果作为等级研判的输入,识别出研判证据。最终将识别结果结合注意力机制进行等级研判,通过对研判证据的精准识别从而来增强等级研判的准确性。实验表明,相比人工或规则识别研判证据,本文提出的方法有着更好的鲁棒性,社会突发事件研判时也达到了较好的效果。 关键词:事件分类 ;研判证据识别 ;等级研判 ;BiLSTM-CRF