Xianglong Liu
2026
FrontCoder: Scaling Visual Fidelity in Front-End Code Generation
Jun Feng | Jian Yang | Wei Zhang | Jing Wang | Keyi Chen | Xiaokun Yang | Weicheng Gu | Yihang Lou | Yan Bai | Xianglong Liu
Findings of the Association for Computational Linguistics: ACL 2026
Jun Feng | Jian Yang | Wei Zhang | Jing Wang | Keyi Chen | Xiaokun Yang | Weicheng Gu | Yihang Lou | Yan Bai | Xianglong Liu
Findings of the Association for Computational Linguistics: ACL 2026
Large language models (LLMs) for code generation have achieved remarkable progress in synthesizing functional code from natural language instructions. However, a critical challenge persists in generating visually accurate and structurally sound front-end code that faithfully renders user-intended layouts and interfaces. Most existing works focus primarily on functional correctness, overlooking the visual fidelity and rendering quality essential for front-end development. To address this gap, we present a comprehensive data construction and training pipeline to enhance front-end code generation capabilities in code LLMs. We use a three-stage training approach: continual pre-training on synthetic data, quality-controlled supervised fine-tuning, and reinforcement learning with checklist-based rewards to improve model performance. Our comprehensive evaluation on front-end code generation benchmarks reveals that even strong base models struggle with visual faithfulness and layout complexity. Our fully-trained model demonstrated substantial improvements over baseline approaches across all domains, achieving competitive performance with frontier models while maintaining generation efficiency, underscoring the critical importance of stage-aligned data curation and vision-grounded optimization in developing reliable front-end code generation systems. Our code and data are open-sourced at https://github.com/leanfeng1/FrontCoder.
UCoder: Unsupervised Code Generation by Internal Probing of Large Language Models
Jiajun Wu | Jian Yang | Wei Zhang | Linzheng Chai | Yuchi Ma | Ensheng Shi | Yuqing Ma | Zhoujun Li | Xianglong Liu
Findings of the Association for Computational Linguistics: ACL 2026
Jiajun Wu | Jian Yang | Wei Zhang | Linzheng Chai | Yuchi Ma | Ensheng Shi | Yuqing Ma | Zhoujun Li | Xianglong Liu
Findings of the Association for Computational Linguistics: ACL 2026
Large language models (LLMs) have demonstrated remarkable capabilities in code generation tasks. However, their effectiveness heavily relies on supervised training with extensive labeled (e.g., question-answering pairs) or unlabeled datasets (e.g., code snippets), which are often expensive and difficult to obtain at scale. To address this limitation, this paper introduces a method IPC, an unsupervised framework that leverages Internal Probing of LLMs for Code generation without any external corpus, even unlabeled code snippets. We introduce the problem space probing, test understanding probing, solution space probing, and knowledge consolidation and reinforcement to probe the internal knowledge and confidence patterns existing in LLMs. Further, IPC identifies reliable code candidates through self-consistency mechanisms and representation-based quality estimation to train UCoder (coder with unsupervised learning). We validate the proposed approach across multiple code benchmarks, demonstrating that unsupervised methods can achieve competitive performance compared to supervised approaches while significantly reducing the dependency on labeled data and computational resources. Analytic experiments reveal that internal model states contain rich signals about code quality and correctness, and that properly harnessing these signals enables effective unsupervised learning for code generation tasks, opening new directions for training code LLMs in resource-constrained scenarios.
Scaling Laws for Code: Every Programming Language Matters
Jian Yang | Shuyue Guo | Linzheng Chai | Wei Zhang | Aishan Liu | Chuan Hao | Zhoujun Li | Xin Zhao | Xianglong Liu | Weifeng Lv | Bryan Dai
Findings of the Association for Computational Linguistics: ACL 2026
Jian Yang | Shuyue Guo | Linzheng Chai | Wei Zhang | Aishan Liu | Chuan Hao | Zhoujun Li | Xin Zhao | Xianglong Liu | Weifeng Lv | Bryan Dai
Findings of the Association for Computational Linguistics: ACL 2026
Large language models (LLMs) are powerful but costly to train, with scaling laws predicting performance from model size, data, and compute. However, different programming languages (PLs) have varying impacts during pre-training that significantly affect base model performance, leading to inaccurate performance prediction. Existing works focus on language-agnostic settings, neglecting the inherently multilingual nature of modern software development. Therefore, it is first necessary to investigate the scaling laws of different PLs, and then consider their mutual influences to arrive at the final multilingual scaling law. In this paper, we present the first systematic exploration of scaling laws for multilingual code pre-training, conducting over 1000+ experiments (Equivalent to 336,000+ H800 hours) across multiple PLs, model sizes (0.2B to 14B parameters), and dataset sizes (1T tokens). We establish scaling laws for code LLMs across multiple programming languages, showing that interpreted languages benefit more from increased scale than compiled ones. Multilingual pre-training provides synergistic benefits, especially between syntactically similar languages, with parallel pairing (concatenating code with translations) significantly enhancing cross-lingual abilities. We propose a proportion-dependent multilingual scaling law that optimally allocates training tokens by prioritizing high-utility languages (e.g., Python), balancing high-synergy pairs (e.g., JavaScript-TypeScript), and reducing allocation to fast-saturating languages (e.g., Rust), achieving superior performance across all languages compared to uniform distribution.
LoopCoder: Scaling Code Intelligence via Looped Language Models
Jian Yang | Wei Zhang | Shuyue Guo | Yizhi LI | Linzheng Chai | Zhengmao Ye | Shukai Liu | Yuyang Song | Jiajun Wu | Che Liu | Tianyu Zheng | Siwei Wu | Leo L | Xudong Ma | Chuan Hao | Ran Tao | Yan Xing | Jianzhou Wang | Mingjie Tang | Aishan Liu | Zhoujun Li | Xianglong Liu | Weifeng Lv | Bryan Dai
Findings of the Association for Computational Linguistics: ACL 2026
Jian Yang | Wei Zhang | Shuyue Guo | Yizhi LI | Linzheng Chai | Zhengmao Ye | Shukai Liu | Yuyang Song | Jiajun Wu | Che Liu | Tianyu Zheng | Siwei Wu | Leo L | Xudong Ma | Chuan Hao | Ran Tao | Yan Xing | Jianzhou Wang | Mingjie Tang | Aishan Liu | Zhoujun Li | Xianglong Liu | Weifeng Lv | Bryan Dai
Findings of the Association for Computational Linguistics: ACL 2026
While large language models (LLMs) have mastered syntax-level code generation, complex algorithmic reasoning remains a challenge, typically addressed by scaling model depth and parameter count. Universal Transformers (UT) offer a compelling alternative by introducing a recurrent inductive bias that aligns with the recursive nature of programming logic. However, training looped architectures at scale has historically been hindered by severe instability and optimization difficulties associated with backpropagation through time (BPTT). We present LoopCoder (40B-A80B) pre-trained on 12T+ code and general tokens, along with LoopCoder-Thinking and LoopCoder-Instruct variants—the first large-scale looped transformer for code, achieving comparable performance to standard dense architectures with more parameters. Unlike prior approaches that restrict recurrence to small-scale tasks, we implement a comprehensive looped training protocol spanning both pre-training and post-training phases. We initiate the model via dense-to-loop transformation, folding a pre-trained dense checkpoint to initialize a recurrent block, followed by rigorous looped pre-training and specialized post-training for instruction following and reasoning. Our results establish a robust recipe for scaling coding intelligence via recurrent computation, proving that dense checkpoints serve as an optimal foundation for evolving into dynamic, looped reasoners.
Context as a Tool: Context Management for Long-Horizon SWE-Agents
Shukai Liu | Bo Jiang | Jian Yang | Yizhi LI | Jinyang Guo | Xianglong Liu | Bryan Dai
Findings of the Association for Computational Linguistics: ACL 2026
Shukai Liu | Bo Jiang | Jian Yang | Yizhi LI | Jinyang Guo | Xianglong Liu | Bryan Dai
Findings of the Association for Computational Linguistics: ACL 2026
Agents based on large language models have recently shown strong potential on real-world software engineering (SWE) tasks that require long-horizon interaction with repository-scale codebases. However, most existing agents rely on append-only context maintenance or passively triggered compression heuristics, which often lead to context explosion, semantic drift, and degraded reasoning in long-running interactions. We propose Cat, a new context management paradigm that elevates context maintenance to a callable tool integrated into the decision-making process of agents. Cat formalizes a structured context workspace consisting of stable task semantics, condensed long-term memory, and high-fidelity short-term interactions, and enables agents to proactively compress historical trajectories into actionable summaries at appropriate milestones. To support context management for SWE-agents, we propose a trajectory-level supervision framework, CaT-Generator, based on an offline data construction pipeline that injects context-management actions into complete interaction trajectories. Using this framework, we train a context-aware model, SWE-Compressor. Experiments on SWE-Bench-Verified demonstrate that SWE-Compressor reaches a 57.6% solved rate and significantly outperforms ReAct-based agents and static compression baselines, while maintaining stable and scalable long-horizon reasoning under a bounded context budget.
SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents
Zonghao Ying | Yangguang Shao | Jianle Gan | Gan Xu | Wenxin Zhang | Quanchen Zou | Junzheng Shi | Zhenfei Yin | Mingchuan Zhang | Aishan Liu | Xianglong Liu
Findings of the Association for Computational Linguistics: ACL 2026
Zonghao Ying | Yangguang Shao | Jianle Gan | Gan Xu | Wenxin Zhang | Quanchen Zou | Junzheng Shi | Zhenfei Yin | Mingchuan Zhang | Aishan Liu | Xianglong Liu
Findings of the Association for Computational Linguistics: ACL 2026
Large vision–language model (LVLM)-based web agents are emerging as powerful automation tools but face severe security risks in real-world deployment. Existing benchmarks offer limited coverage, typically isolating user-level prompts from environmental threats, thus failing to capture the full spectrum of vulnerabilities. To address this, we present SecureWebArena, the first holistic security benchmark for web agents. SecureWebArena features a unified suite of six realistic web environments with 2,970 adversarial trajectories, covering a structured taxonomy of six attack vectors that span both user-level and environment-level manipulations. Crucially, we introduce a multi-layered evaluation protocol that dissects agent failures across internal reasoning, behavioral execution, and task outcomes, enabling fine-grained risk analysis beyond simple success metrics. Experiments on 9 representative LVLMs reveal universal vulnerabilities to subtle manipulations and uncover significant trade-offs between model specialization and security. SecureWebArena establishes a rigorous foundation for advancing the development of trustworthy web agents.
Half-S: Halving the Scale for Near-Lossless 4-Bit LLM Training
Jinyang Du | Ruihao Gong | Linghan Ai | Zining Wang | Yunke Peng | Yao Wang | Lei Yan | Wxuefei | Yaoyuan Wang | Jinyang Guo | Dahua Lin | Xianglong Liu
Findings of the Association for Computational Linguistics: ACL 2026
Jinyang Du | Ruihao Gong | Linghan Ai | Zining Wang | Yunke Peng | Yao Wang | Lei Yan | Wxuefei | Yaoyuan Wang | Jinyang Guo | Dahua Lin | Xianglong Liu
Findings of the Association for Computational Linguistics: ACL 2026
Training large language models (LLMs) at 4-bit precision offers substantial efficiency gains but remains challenging due to the limited dynamic range and coarse numerical resolution. Existing 4-bit training pipelines typically rely on max-scaling, which is ill-suited for heavy-tailed LLM tensor distributions and leads to severe under-utilization of the FP4 quantization grid in the low-magnitude region. This effect causes pronounced representation collapse and large rounding errors for the values that dominate LLM computation. In this work, we derive the theoretically optimal scaling for FP4 under heavy-tailed inputs, revealing why max-scaling is intrinsically suboptimal. Guided by this analysis, we propose Half-S, a simple and efficient scaling strategy that uses half-scaling as a hardware-friendly default and falls back to an MSE-based clipping threshold when needed, yielding a close approximation to the theoretical optimum under real LLM statistics. Extensive experiments on large-scale pretraining and downstream fine-tuning show that Half-S consistently narrows the gap to BF16 in both convergence and final model quality, while preserving the efficiency benefits of 4-bit computation. Under native FP4 support, Half-S is estimated to provide up to 1.8× end-to-end training speedup. These results indicate that Half-S provides a simple and effective correction to max-scaling, substantially improving the stability and accuracy of 4-bit LLM training.
Uncovering Strategic Egoism Behaviors in Large Language Models
Yaoyuan Zhang | Zonghao Ying | Aishan Liu | Jian Yang | Tianlin Li | Yaodong Yang | Xianglong Liu
Findings of the Association for Computational Linguistics: ACL 2026
Yaoyuan Zhang | Zonghao Ying | Aishan Liu | Jian Yang | Tianlin Li | Yaodong Yang | Xianglong Liu
Findings of the Association for Computational Linguistics: ACL 2026
Large language models (LLMs) exhibit growing safety and alignment risks, hindering their deployment in high-stakes decision-making scenarios. In this paper, we identify a previously underexplored risk: similar to humans, LLMs can exhibit egoistic decision-making, in which they pursue short-term self-benefits through improper means while disregarding collective welfare and ethical constraints. We term this phenomenon Strategic Egoism (SE). To systematically evaluate SE, we introduce SEBench, a benchmark comprising 880 decision-making scenarios across 11 domains involving explicit profit temptations, which measures egoistic behavior along 6 psychologically grounded dimensions (e.g., rule circumvention). Each scenario adopts a single-role decision-making setting with carefully designed choice options to elicit self-serving strategies. Extensive experiments on 9 proprietary LLMs reveal that SE behaviors are widespread, with an average occurrence rate of 67.96%, and frequently manifest as manipulative coercion. Notably, we find that models more susceptible to profit temptations also exhibit broader safety deficiencies, including higher toxicity, lower truthfulness, increased jailbreak vulnerability, and elevated Dark Triad–style trait scores. Drawing inspiration from psychological interventions, we further propose SEGuard, a lightweight mitigation that reinforces situational constraints and suppresses egoistic tactics.
2025
Lexical Diversity-aware Relevance Assessment for Retrieval-Augmented Generation
Zhange Zhang | Yuqing Ma | Yulong Wang | Shan He | Tianbo Wang | Siqi He | Jiakai Wang | Xianglong Liu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhange Zhang | Yuqing Ma | Yulong Wang | Shan He | Tianbo Wang | Siqi He | Jiakai Wang | Xianglong Liu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Retrieval-Augmented Generation (RAG) has proven effective in enhancing the factuality of LLMs’ generation, making them a focal point of research. However, previous RAG approaches overlook the lexical diversity of queries, hindering their ability to achieve a granular relevance assessment between queries and retrieved documents, resulting in suboptimal performance. In this paper, we introduce a Lexical Diversity-aware RAG (DRAG) method to address the biases in relevant information retrieval and utilization induced by lexical diversity. Specifically, a Diversity-sensitive Relevance Analyzer is proposed to decouple and assess the relevance of different query components (words, phrases) based on their levels of lexical diversity, ensuring precise and comprehensive document retrieval. Moreover, a Risk-guided Sparse Calibration strategy is further introduced to calibrate the generated tokens that is heavily affected by irrelevant content. Through these modules, DRAG is capable of effectively retrieving relevant documents and leverages their pertinent knowledge to refine the original results and generate meaningful outcomes. Extensive experiments on widely used benchmarks demonstrate the efficacy of our approach, yielding a 10.6% accuracy improvement on HotpotQA.
Dynamic Parallel Tree Search for Efficient LLM Reasoning
Yifu Ding | Wentao Jiang | Shunyu Liu | Yongcheng Jing | Jinyang Guo | Yingjie Wang | Jing Zhang | Zengmao Wang | Ziwei Liu | Bo Du | Xianglong Liu | Dacheng Tao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yifu Ding | Wentao Jiang | Shunyu Liu | Yongcheng Jing | Jinyang Guo | Yingjie Wang | Jing Zhang | Zengmao Wang | Ziwei Liu | Bo Du | Xianglong Liu | Dacheng Tao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Tree of Thoughts (ToT) enhances Large Language Model (LLM) reasoning by structuring problem-solving as a spanning tree. However, recent methods focus on search accuracy while overlooking computational efficiency. The challenges of accelerating the ToT lie in the frequent switching of reasoning focus, and the redundant exploration of suboptimal solutions. To alleviate this dilemma, we propose Dynamic Parallel Tree Search (DPTS), a novel parallelism framework that aims to dynamically optimize the reasoning path in inference. It includes the Parallelism Streamline in the generation phase to build up a flexible and adaptive parallelism with arbitrary paths by cache management and alignment. Meanwhile, the Search and Transition Mechanism filters potential candidates to dynamically maintain the reasoning focus on more possible solutions with less redundancy. Experiments on Qwen-2.5 and Llama-3 on math and code datasets show that DPTS significantly improves efficiency by 2-4× on average while maintaining or even surpassing existing reasoning algorithms in accuracy, making ToT-based reasoning more scalable and computationally efficient. Codes are released at: https://github.com/yifu-ding/DPTS.
Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models
Zonghao Ying | Deyue Zhang | Zonglei Jing | Yisong Xiao | Quanchen Zou | Aishan Liu | Siyuan Liang | Xiangzheng Zhang | Xianglong Liu | Dacheng Tao
Findings of the Association for Computational Linguistics: EMNLP 2025
Zonghao Ying | Deyue Zhang | Zonglei Jing | Yisong Xiao | Quanchen Zou | Aishan Liu | Siyuan Liang | Xiangzheng Zhang | Xianglong Liu | Dacheng Tao
Findings of the Association for Computational Linguistics: EMNLP 2025
Multi-turn jailbreak attacks simulate real-world human interactions by engaging large language models (LLMs) in iterative dialogues, exposing critical safety vulnerabilities. However, existing methods often struggle to balance semantic coherence with attack effectiveness, resulting in either benign semantic drift or ineffective detection evasion. To address this challenge, we propose Reasoning-Augmented Conversation (RACE), a novel multi-turn jailbreak framework that reformulates harmful queries into benign reasoning tasks and leverages LLMs’ strong reasoning capabilities to compromise safety alignment. Specifically, we introduce an attack state machine framework to systematically model problem translation and iterative reasoning, ensuring coherent query generation across multiple turns. Building on this framework, we design gain-guided exploration, self-play, and rejection feedback modules to preserve attack semantics, enhance effectiveness, and sustain reasoning-driven attack progression. Extensive experiments on multiple LLMs demonstrate that RACE achieves state-of-the-art attack effectiveness in complex conversational scenarios, with attack success rates (ASRs) increasing by up to 96%. Notably, our approach achieves average ASR of 83.3% against leading commercial models, including Gemini 2.0 Flashing Thinking and OpenAI o1, underscoring its potency.
Token-Aware Editing of Internal Activations for Large Language Model Alignment
Tianbo Wang | Yuqing Ma | Kewei Liao | Chengzhao Yang | Zhange Zhang | Jiakai Wang | Xianglong Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Tianbo Wang | Yuqing Ma | Kewei Liao | Chengzhao Yang | Zhange Zhang | Jiakai Wang | Xianglong Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Intervening the internal activations of large language models (LLMs) provides an effective inference-time alignment approach to mitigate undesirable behaviors, such as generating erroneous or harmful content, thereby ensuring safe and reliable applications of LLMs. However, previous methods neglect the misalignment discrepancy among varied tokens, resulting in deviant alignment direction and inflexible editing strength. To address these issues, we propose a token-aware editing (TAE) approach to fully utilize token-level alignment information in the activation space, therefore realizing superior post-intervention performance. Specifically, a Mutual Information-guided Graph Aggregation (MIG) module first develops an MI-guided graph to exploit the tokens’ informative interaction for activation enrichment, thus improving alignment probing and facilitating intervention. Subsequently, Misalignment-aware Adaptive Intervention (MAI) comprehensively perceives the token-level misalignment degree from token representation and prediction to guide the adaptive adjustment of editing strength, thereby enhancing final alignment performance. Extensive experiments on three alignment capabilities demonstrate the efficacy of TAE, notably surpassing baseline by 25.8% on the primary metric of truthfulness with minimal cost.
2024
LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit
Ruihao Gong | Yang Yong | Shiqiao Gu | Yushi Huang | Chengtao Lv | Yunchen Zhang | Dacheng Tao | Xianglong Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Ruihao Gong | Yang Yong | Shiqiao Gu | Yushi Huang | Chengtao Lv | Yunchen Zhang | Dacheng Tao | Xianglong Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Recent advancements in large language models (LLMs) are propelling us toward artificial general intelligence with their remarkable emergent abilities and reasoning capabilities. However, the substantial computational and memory requirements limit the widespread adoption. Quantization, a key compression technique, can effectively mitigate these demands by compressing and accelerating LLMs, albeit with potential risks to accuracy. Numerous studies have aimed to minimize the accuracy loss associated with quantization. However, their quantization configurations vary from each other and cannot be fairly compared. In this paper, we present LLMC, a plug-and-play compression toolkit, to fairly and systematically explore the impact of quantization. LLMC integrates dozens of algorithms, models, and hardware, offering high extensibility from integer to floating-point quantization, from LLM to vision-language (VLM) model, from fixed-bit to mixed precision, and from quantization to sparsification. Powered by this versatile toolkit, our benchmark covers three key aspects: calibration data, algorithms (three strategies), and data formats, providing novel insights and detailed analyses for further research and practical guidance for users. Our toolkit is available at https://github.com/ModelTC/llmc.
DB-LLM: Accurate Dual-Binarization for Efficient LLMs
Hong Chen | Chengtao Lv | Liang Ding | Haotong Qin | Xiabin Zhou | Yifu Ding | Xuebo Liu | Min Zhang | Jinyang Guo | Xianglong Liu | Dacheng Tao
Findings of the Association for Computational Linguistics: ACL 2024
Hong Chen | Chengtao Lv | Liang Ding | Haotong Qin | Xiabin Zhou | Yifu Ding | Xuebo Liu | Min Zhang | Jinyang Guo | Xianglong Liu | Dacheng Tao
Findings of the Association for Computational Linguistics: ACL 2024
Large language models (LLMs) have significantly advanced the field of natural language processing, while the expensive memory and computation consumption impede their practical deployment. Quantization emerges as one of the most effective methods for improving the computational efficiency of LLMs. However, existing ultra-low-bit quantization always causes severe accuracy drops. In this paper, we empirically investigate the micro and macro characteristics of ultra-low bit quantization and present a novel Dual-Binarization method for LLMs, namely DB-LLM. For the micro-level, we take both the accuracy advantage of 2-bit-width and the efficiency advantage of binarization into account, introducing Flexible Dual Binarization (FDB). By splitting 2-bit quantized weights into two independent sets of binaries, FDB ensures the accuracy of representations and introduces flexibility, utilizing the efficient bitwise operations of binarization while retaining the inherent high sparsity of ultra-low bit quantization. For the macro-level, we find the distortion that exists in the prediction of LLM after quantization, which is specified as the deviations related to the ambiguity of samples. We propose the Deviation-Aware Distillation (DAD) method, enabling the model to focus differently on various samples. Comprehensive experiments show that our DB-LLM not only significantly surpasses the current State-of-The-Art (SoTA) in ultra-low bit quantization (, perplexity decreased from 9.64 to 7.23), but also achieves an additional 20% reduction in computational consumption compared to the SOTA method under the same bit-width. Our code is available at https://github.com/Hon-Chen/DB-LLM.
2023
Outlier Suppression+: Accurate quantization of large language models by equivalent and effective shifting and scaling
Xiuying Wei | Yunchen Zhang | Yuhang Li | Xiangguo Zhang | Ruihao Gong | Jinyang Guo | Xianglong Liu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Xiuying Wei | Yunchen Zhang | Yuhang Li | Xiangguo Zhang | Ruihao Gong | Jinyang Guo | Xianglong Liu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Post-training quantization (PTQ) of transformer language models faces significant challenges due to the existence of detrimental outliers in activations. We observe that these outliers are concentrated in specific channels and are asymmetric across channels. To address this issue, we propose the Outlier Suppression+ (OS+) framework, which contains the channel-wise shifting for asymmetry and channel-wise scaling for concentration. We show that these operations can be seamlessly migrated into subsequent modules while maintaining equivalence. Second, we propose a fast and stable scheme to calculate effective shifting and scaling values. The channel-wise shifting aligns the center of each channel for removal of outlier asymmetry. The channel-wise scaling quantitatively evaluates changes brought by migration and quantization for better quantization burden balance. We validate our OS+ under both standard and fine-grained quantization settings with models including BERT, OPT, BLOOM, BLOOMZ, and LLaMA. Comprehensive results across various tasks demonstrate the superiority of our approach. Especially, with standard quantization, OS+ can achieve near-floating-point performance on both small models and large language models on 8-bit and 6-bit. Besides, we establish a new state-of-the-art for 4-bit BERT with 15.5% improvement. Our code is available at https://github.com/ModelTC/Outlier_Suppression_Plus.
Adaptive Contrastive Knowledge Distillation for BERT Compression
Jinyang Guo | Jiaheng Liu | Zining Wang | Yuqing Ma | Ruihao Gong | Ke Xu | Xianglong Liu
Findings of the Association for Computational Linguistics: ACL 2023
Jinyang Guo | Jiaheng Liu | Zining Wang | Yuqing Ma | Ruihao Gong | Ke Xu | Xianglong Liu
Findings of the Association for Computational Linguistics: ACL 2023
In this paper, we propose a new knowledge distillation approach called adaptive contrastive knowledge distillation (ACKD) for BERT compression. Different from existing knowledge distillation methods for BERT that implicitly learn discriminative student features by mimicking the teacher features, we first introduce a novel contrastive distillation loss (CDL) based on hidden state features in BERT as the explicit supervision to learn discriminative student features. We further observe sentences with similar features may have completely different meanings, which makes them hard to distinguish. Existing methods do not pay sufficient attention to these hard samples with less discriminative features. Therefore, we propose a new strategy called sample adaptive reweighting (SAR) to adaptively pay more attention to these hard samples and strengthen their discrimination abilities. We incorporate our SAR strategy into our CDL and form the adaptive contrastive distillation loss, based on which we construct our ACKD framework. Comprehensive experiments on multiple natural language processing tasks demonstrate the effectiveness of our ACKD framework.
Search
Fix author
Co-authors
- Jinyang Guo 6
- Jian Yang 6
- Aishan Liu 5
- Ruihao Gong 4
- Yuqing Ma 4
- Dacheng Tao 4
- Wei Zhang 4
- Linzheng Chai 3
- Bryan Dai 3
- Zhoujun Li 3
- Zonghao Ying 3
- Yifu Ding 2
- Shuyue Guo 2
- Chuan Hao 2
- Yizhi Li 2
- Shukai Liu 2
- Weifeng Lv 2
- Chengtao Lv 2
- Tianbo Wang 2
- Jiakai Wang 2
- Jiajun Wu 2
- Zhange Zhang 2
- Yunchen Zhang 2
- Quanchen Zou 2
- Linghan Ai 1
- Yan Bai 1
- Keyi Chen 1
- Hong Chen 1
- Liang Ding 1
- Bo Du 1
- Jinyang Du 1
- Jun Feng 1
- Jianle Gan 1
- Weicheng Gu 1
- Shiqiao Gu 1
- Shan He 1
- Siqi He 1
- Yushi Huang 1
- Wentao Jiang 1
- Bo Jiang 1
- Yongcheng Jing 1
- Zonglei Jing 1
- Leo L 1
- Yuhang Li 1
- Tianlin Li 1
- Siyuan Liang 1
- Kewei Liao 1
- Dahua Lin 1
- Che Liu 1
- Shunyu Liu 1
- Ziwei Liu 1
- Jiaheng Liu 1
- Xuebo Liu 1
- Yihang Lou 1
- Yuchi Ma 1
- Xudong Ma 1
- Yunke Peng 1
- Haotong Qin 1
- Yangguang Shao 1
- Ensheng Shi 1
- Junzheng Shi 1
- Yuyang Song 1
- Mingjie Tang 1
- Ran Tao 1
- Yulong Wang 1
- Jing Wang 1
- Jianzhou Wang 1
- Yingjie Wang 1
- Zengmao Wang 1
- Zining Wang 1
- Zining Wang 1
- Yao Wang 1
- Yaoyuan Wang 1
- Xiuying Wei 1
- Siwei Wu 1
- Wxuefei 1
- Yisong Xiao 1
- Yan Xing 1
- Ke Xu 1
- Gan Xu 1
- Lei Yan 1
- Xiaokun Yang 1
- Chengzhao Yang 1
- Yaodong Yang (杨耀东) 1
- Zhengmao Ye 1
- Zhenfei Yin 1
- Yang Yong 1
- Jing Zhang 1
- Xiangguo Zhang 1
- Deyue Zhang 1
- Xiangzheng Zhang 1
- Wenxin Zhang 1
- Mingchuan Zhang 1
- Min Zhang 1
- Yaoyuan Zhang 1
- Wayne Xin Zhao 1
- Tianyu Zheng 1
- Xiabin Zhou 1