2025
pdf
bib
abs
PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models
Jiaqi Zhao
|
Miao Zhang
|
Ming Wang
|
Yuzhang Shang
|
Kaihao Zhang
|
Weili Guan
|
Yaowei Wang
|
Min Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) suffer severe performance degradation when facing extremely low-bit (sub 2-bit) quantization. Several existing sub 2-bit post-training quantization (PTQ) methods utilize a mix-precision scheme by leveraging an unstructured fine-grained mask to explicitly distinguish salient weights, while which introduces an extra 1-bit or more per weight. To explore the real limit of PTQ, we propose an extremely low-bit PTQ method called PTQ1.61, which enables weight quantization to 1.61-bit for the first time. Specifically, we first introduce a one-dimensional structured mask with negligibly additional 0.0002-bit per weight based on input activations from the perspective of reducing the upper bound of quantization error to allocate corresponding salient weight channels to 4-bit. For non-salient channels binarization, an efficient block-wise scaling factors optimization framework is then presented to take implicit row-wise correlations and angular biases into account. Different from prior works that concentrate on adjusting quantization methodologies, we further propose a novel paradigm called quantization preprocessing, where we argue that transforming the weight distribution of the pretrained model before quantization can alleviate the difficulty in per-channel extremely low-bit PTQ. Extensive experiments indicate our PTQ1.61 achieves state-of-the-art performance in extremely low-bit quantization. Codes are available at https://github.com/zjq0455/PTQ1.61.
pdf
bib
abs
AnnaAgent: Dynamic Evolution Agent System with Multi-Session Memory for Realistic Seeker Simulation
Ming Wang
|
Peidong Wang
|
Lin Wu
|
Xiaocui Yang
|
Daling Wang
|
Shi Feng
|
Yuxin Chen
|
Bixuan Wang
|
Yifei Zhang
Findings of the Association for Computational Linguistics: ACL 2025
Constrained by the cost and ethical concerns of involving real seekers in AI-driven mental health, researchers develop LLM-based conversational agents (CAs) with tailored configurations, such as profiles, symptoms, and scenarios, to simulate seekers. While these efforts advance AI in mental health, achieving more realistic seeker simulation remains hindered by two key challenges: dynamic evolution and multi-session memory. Seekers’ mental states often fluctuate during counseling, which typically spans multiple sessions. To address this, we propose **AnnaAgent**, an emotional and cognitive dynamic agent system equipped with tertiary memory. AnnaAgent incorporates an emotion modulator and a complaint elicitor trained on real counseling dialogues, enabling dynamic control of the simulator’s configurations. Additionally, its tertiary memory mechanism effectively integrates short-term and long-term memory across sessions. Evaluation results, both automated and manual, demonstrate that AnnaAgent achieves more realistic seeker simulation in psychological counseling compared to existing baselines. The ethically reviewed and screened code can be found on [https://github.com/sci-m-wang/AnnaAgent](https://github.com/sci-m-wang/AnnaAgent).
2024
pdf
bib
abs
LRQuant: Learnable and Robust Post-Training Quantization for Large Language Models
Jiaqi Zhao
|
Miao Zhang
|
Chao Zeng
|
Ming Wang
|
Xuebo Liu
|
Liqiang Nie
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Post-training quantization (PTQ) for large language models (LLMs) significantly accelerates model inference and relieves memory constraints, without incurring model training. A “smoothing paradigm” is commonly used in LLM quantization, which transfers the quantization difficulty of activation to weight quantization using mathematically equivalent transformations. However, existing methods face two issues: 1) Most smoothing parameters are hand-crafted defined which leads to suboptimal results; 2) There are significant performance degradations when tested on unseen datasets. To address these challenges, this paper introduces a robust learnable smooth-based PTQ framework, called LRQuant. Firstly, we consider a learnable paradigm to find optimal smoothing parameters which are initialized by logarithmic activation equivalent. In addition, we empirically found that only relying on MSE loss could hardly lead to optimal quantization results, and we then propose a novel loss function based on the negative logarithm of cosine similarity (NLC loss) between outputs of full-precision and quantized block. At last, we pioneeringly introduce Test-time adaptation (TTA) into LLM quantization, which allows for rapid model adaptation during testing to improve generalization performance. More surprisingly, we find that by using our TTA method, we can achieve better results on test sets than directly using test sets for calibration in some cases while avoiding catastrophic forgetting. Codes are available at https://github.com/zjq0455/RLQ.
2022
pdf
bib
abs
EasyNLP: A Comprehensive and Easy-to-use Toolkit for Natural Language Processing
Chengyu Wang
|
Minghui Qiu
|
Taolin Zhang
|
Tingting Liu
|
Lei Li
|
Jianing Wang
|
Ming Wang
|
Jun Huang
|
Wei Lin
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Pre-Trained Models (PTMs) have reshaped the development of Natural Language Processing (NLP) and achieved significant improvement in various benchmarks. Yet, it is not easy for industrial practitioners to obtain high-performing PTM-based models without a large amount of labeled training data and deploy them online with fast inference speed. To bridge this gap, EasyNLP is designed to make it easy to build NLP applications, which supports a comprehensive suite of NLP algorithms. It further features knowledge-enhanced pre-training, knowledge distillation and few-shot learning functionalities, and provides a unified framework of model training, inference and deployment for real-world applications. EasyNLP has powered over ten business units within Alibaba Group and is seamlessly integrated to the Platform of AI (PAI) products on Alibaba Cloud. The source code of EasyNLP is released at GitHub (
https://github.com/alibaba/EasyNLP).
2021
pdf
bib
abs
Word Sense Disambiguation: Towards Interactive Context Exploitation from Both Word and Sense Perspectives
Ming Wang
|
Yinglin Wang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Lately proposed Word Sense Disambiguation (WSD) systems have approached the estimated upper bound of the task on standard evaluation benchmarks. However, these systems typically implement the disambiguation of words in a document almost independently, underutilizing sense and word dependency in context. In this paper, we convert the nearly isolated decisions into interrelated ones by exposing senses in context when learning sense embeddings in a similarity-based Sense Aware Context Exploitation (SACE) architecture. Meanwhile, we enhance the context embedding learning with selected sentences from the same document, rather than utilizing only the sentence where each ambiguous word appears. Experiments on both English and multilingual WSD datasets have shown the effectiveness of our approach, surpassing previous state-of-the-art by large margins (3.7% and 1.2% respectively), especially on few-shot (14.3%) and zero-shot (35.9%) scenarios.
pdf
bib
abs
Enhancing the Context Representation in Similarity-based Word Sense Disambiguation
Ming Wang
|
Jianzhang Zhang
|
Yinglin Wang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
In previous similarity-based WSD systems, studies have allocated much effort on learning comprehensive sense embeddings using contextual representations and knowledge sources. However, the context embedding of an ambiguous word is learned using only the sentence where the word appears, neglecting its global context. In this paper, we investigate the contribution of both word-level and sense-level global context of an ambiguous word for disambiguation. Experiments have shown that the Context-Oriented Embedding (COE) can enhance a similarity-based system’s performance on WSD by relatively large margins, achieving state-of-the-art on all-words WSD benchmarks in knowledge-based category.
2020
pdf
bib
abs
A Synset Relation-enhanced Framework with a Try-again Mechanism for Word Sense Disambiguation
Ming Wang
|
Yinglin Wang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Contextual embeddings are proved to be overwhelmingly effective to the task of Word Sense Disambiguation (WSD) compared with other sense representation techniques. However, these embeddings fail to embed sense knowledge in semantic networks. In this paper, we propose a Synset Relation-Enhanced Framework (SREF) that leverages sense relations for both sense embedding enhancement and a try-again mechanism that implements WSD again, after obtaining basic sense embeddings from augmented WordNet glosses. Experiments on all-words and lexical sample datasets show that the proposed system achieves new state-of-the-art results, defeating previous knowledge-based systems by at least 5.5 F1 measure. When the system utilizes sense embeddings learned from SemCor, it outperforms all previous supervised systems with only 20% SemCor data.
2017
pdf
bib
abs
YNUDLG at SemEval-2017 Task 4: A GRU-SVM Model for Sentiment Classification and Quantification in Twitter
Ming Wang
|
Biao Chu
|
Qingxun Liu
|
Xiaobing Zhou
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
Sentiment analysis is one of the central issues in Natural Language Processing and has become more and more important in many fields. Typical sentiment analysis classifies the sentiment of sentences into several discrete classes (e.g.,positive or negative). In this paper we describe our deep learning system(combining GRU and SVM) to solve both two-, three- and five-tweet polarity classifications. We first trained a gated recurrent neural network using pre-trained word embeddings, then we extracted features from GRU layer and input these features into support vector machine to fulfill both the classification and quantification subtasks. The proposed approach achieved 37th, 19th, and 14rd places in subtasks A, B and C, respectively.