Xuezhi Fang
2026
If an LLM Were a Character, Would It Know Its Own Story? Evaluating Lifelong Learning in LLMs
Siqi Fan | Xiusheng Huang | Yiqun Yao | Xuezhi Fang | Kang Liu | Peng Han | Shuo Shang | Aixin Sun | Yequan Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Siqi Fan | Xiusheng Huang | Yiqun Yao | Xuezhi Fang | Kang Liu | Peng Han | Shuo Shang | Aixin Sun | Yequan Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) can carry out human-like dialogue, but unlike humans, they are stateless due to the superposition property. However, during multi-turn, multi-agent interactions, LLMs begin to exhibit consistent, character-like behaviors—hinting at a form of emergent lifelong learning. Despite this, existing benchmarks often fail to capture these dynamics, primarily focusing on static, open-ended evaluations. To address this gap, we introduce LifeState-BENCH, a benchmark designed to assess lifelong learning in LLMs. It features two episodic datasets—Hamlet and a synthetic script collection—rich in narrative structure and character interactions. Our fact-checking evaluation probes models’ self-awareness, episodic memory retrieval, and relationship tracking, across both parametric and non-parametric approaches. Experiments on models like Llama3.1-8B, GPT-4-turbo, and DeepSeek R1, we demonstrate that non-parametric methods significantly outperform parametric ones in managing stateful learning. However, all models exhibit challenges with catastrophic forgetting as interactions extend, highlighting the need for further advancements in lifelong learning.
2025
Position-Aware Depth Decay Decoding (D3): Boosting Large Language Model Inference Efficiency
Siqi Fan | Xuezhi Fang | Xingrun Xing | Peng Han | Shuo Shang | Yequan Wang
Findings of the Association for Computational Linguistics: ACL 2025
Siqi Fan | Xuezhi Fang | Xingrun Xing | Peng Han | Shuo Shang | Yequan Wang
Findings of the Association for Computational Linguistics: ACL 2025
Due to the large number of parameters, the inference phase of Large Language Models (LLMs) is resource-intensive. Unlike traditional model compression, which needs retraining, recent dynamic computation methods show that not all components are required for inference, enabling a training-free pipeline.In this paper, we focus on the dynamic depth of LLM generation. A token-position aware layer skipping framework is proposed to save 1.5x times operations efficiently while maintaining performance.We first observed that tokens predicted later have lower perplexity and thus require less computation. Then, we propose a training-free algorithm called Position-Aware Depth Decay Decoding (), which leverages a power-law decay function, ⌊ L × (𝛼i) ⌋, to determine the number of layers to retain when generating token Ti. Remarkably, without any retraining, the achieves success across a wide range of generation tasks for the first time.Experiments on large language models (the Llama) with 7 ∼ 70 billion parameters show that can achieve an average 1.5x speedup compared with the full-inference pipeline while maintaining comparable performance with nearly no performance drop (<1%) on the GSM8K and BBH benchmarks.
2022
CTAP for Chinese:A Linguistic Complexity Feature Automatic Calculation Platform
Yue Cui | Junhui Zhu | Liner Yang | Xuezhi Fang | Xiaobin Chen | Yujie Wang | Erhong Yang
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Yue Cui | Junhui Zhu | Liner Yang | Xuezhi Fang | Xiaobin Chen | Yujie Wang | Erhong Yang
Proceedings of the Thirteenth Language Resources and Evaluation Conference
The construct of linguistic complexity has been widely used in language learning research. Several text analysis tools have been created to automatically analyze linguistic complexity. However, the indexes supported by several existing Chinese text analysis tools are limited and different because of different research purposes. CTAP is an open-source linguistic complexity measurement extraction tool, which prompts any research purposes. Although it was originally developed for English, the Unstructured Information Management (UIMA) framework it used allows the integration of other languages. In this study, we integrated the Chinese component into CTAP, describing the index sets it incorporated and comparing it with three linguistic complexity tools for Chinese. The index set includes four levels of 196 linguistic complexity indexes: character level, word level, sentence level, and discourse level. So far, CTAP has implemented automatic calculation of complexity characteristics for four languages, aiming to help linguists without NLP background study language complexity.