Xuezhi Fang
2025
Position-Aware Depth Decay Decoding (D3): Boosting Large Language Model Inference Efficiency
Siqi Fan
|
Xuezhi Fang
|
Xingrun Xing
|
Peng Han
|
Shuo Shang
|
Yequan Wang
Findings of the Association for Computational Linguistics: ACL 2025
Due to the large number of parameters, the inference phase of Large Language Models (LLMs) is resource-intensive. Unlike traditional model compression, which needs retraining, recent dynamic computation methods show that not all components are required for inference, enabling a training-free pipeline.In this paper, we focus on the dynamic depth of LLM generation. A token-position aware layer skipping framework is proposed to save 1.5x times operations efficiently while maintaining performance.We first observed that tokens predicted later have lower perplexity and thus require less computation. Then, we propose a training-free algorithm called Position-Aware Depth Decay Decoding (), which leverages a power-law decay function, \left\lfloor L × (𝛼i) \right\rfloor, to determine the number of layers to retain when generating token Ti. Remarkably, without any retraining, the achieves success across a wide range of generation tasks for the first time.Experiments on large language models (the Llama) with 7 ∼ 70 billion parameters show that can achieve an average 1.5x speedup compared with the full-inference pipeline while maintaining comparable performance with nearly no performance drop (<1%) on the GSM8K and BBH benchmarks.
2022
CTAP for Chinese:A Linguistic Complexity Feature Automatic Calculation Platform
Yue Cui
|
Junhui Zhu
|
Liner Yang
|
Xuezhi Fang
|
Xiaobin Chen
|
Yujie Wang
|
Erhong Yang
Proceedings of the Thirteenth Language Resources and Evaluation Conference
The construct of linguistic complexity has been widely used in language learning research. Several text analysis tools have been created to automatically analyze linguistic complexity. However, the indexes supported by several existing Chinese text analysis tools are limited and different because of different research purposes. CTAP is an open-source linguistic complexity measurement extraction tool, which prompts any research purposes. Although it was originally developed for English, the Unstructured Information Management (UIMA) framework it used allows the integration of other languages. In this study, we integrated the Chinese component into CTAP, describing the index sets it incorporated and comparing it with three linguistic complexity tools for Chinese. The index set includes four levels of 196 linguistic complexity indexes: character level, word level, sentence level, and discourse level. So far, CTAP has implemented automatic calculation of complexity characteristics for four languages, aiming to help linguists without NLP background study language complexity.