AGGC: Adaptive Group Gradient Clipping for Stabilizing Large Language Model Training

Zhiyuan Li; Yuan Wu; Yi Chang

AGGC: Adaptive Group Gradient Clipping for Stabilizing Large Language Model Training

Abstract

To stabilize the training of Large Language Models (LLMs), gradient clipping is a nearly ubiquitous heuristic used to alleviate exploding gradients. However, traditional global norm clipping erroneously presupposes gradient homogeneity across different functional modules, leading to an adverse "spill-over" effect where volatile parameters force unnecessary scaling on stable ones. To overcome this, we propose Adaptive Group-wise Gradient Clipping (AGGC). AGGC partitions parameters into groups based on functional types and regulates each according to its historical behavior using an Exponential Moving Average (EMA). Specifically, it constructs an adaptive interval to simultaneously mitigate gradient explosion and vanishing, while employing a time-dependent scheduling mechanism to balance exploration and convergence. Experiments on LLaMA 2-7B, Mistral-7B, and Gemma-7B models demonstrate that AGGC-enhanced LoRA consistently outperforms standard LoRA and frequently exceeds Full Fine-Tuning performance. Specifically, on the GSM8K benchmark, Mistral-7B fine-tuned with AGGC-enhanced LoRA achieves 72.93% accuracy, surpassing the 69.5% of vanilla LoRA. AGGC also contributes to the stability of Reinforcement Learning with Verifiable Rewards (RLVR), leading to improved logical deduction in Qwen 2.5 and Llama 3.2 models. Experimental results demonstrate that AGGC effectively addresses the limitations of traditional gradient clipping methods, particularly in overcoming gradient heterogeneity, by utilizing a modular, adaptive clipping strategy to stabilize the training process. Due to its lightweight design, AGGC can be seamlessly integrated into existing post-training pipelines with negligible overhead.

Anthology ID:: 2026.findings-acl.339
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6837–6851
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.339/
DOI:
Bibkey:
Cite (ACL):: Zhiyuan Li, Yuan Wu, and Yi Chang. 2026. AGGC: Adaptive Group Gradient Clipping for Stabilizing Large Language Model Training. In Findings of the Association for Computational Linguistics: ACL 2026, pages 6837–6851, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: AGGC: Adaptive Group Gradient Clipping for Stabilizing Large Language Model Training (Li et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.339.pdf
Checklist:: 2026.findings-acl.339.checklist.pdf

PDF Cite Search Checklist Fix data