Zefeng Zhang
2026
Sparse Growing Transformer: Training-Time Sparse Depth Allocation via Progressive Attention Looping
Yao Chen | Yilong Chen | Yinqi Yang | Junyuan Shang | Zhenyu Zhang | Zefeng Zhang | Shuaiyi Nie | Shuohuan Wang | Yu Sun | Hua Wu | Haifeng Wang | Tingwen Liu
Findings of the Association for Computational Linguistics: ACL 2026
Yao Chen | Yilong Chen | Yinqi Yang | Junyuan Shang | Zhenyu Zhang | Zefeng Zhang | Shuaiyi Nie | Shuohuan Wang | Yu Sun | Hua Wu | Haifeng Wang | Tingwen Liu
Findings of the Association for Computational Linguistics: ACL 2026
Existing approaches to increasing the effective depth of Transformers predominantly rely on parameter reuse, extending computation through recursive execution.Under this paradigm, the network structure remains static along the training timeline, and additional computational depth is uniformly assigned to entire blocks at the parameter level.This rigidity across training time and parameter space leads to substantial computational redundancy during training.In contrast, we argue that depth allocation during training should not be a static preset, but rather a progressively growing structural process. Our systematic analysis reveals a deep-to-shallow maturation trajectory across layers, where high-entropy attention heads play a crucial role in semantic integration. Motivated by this observation, we introduce the Sparse Growing Transformer (SGT).SGT is a training-time sparse depth allocation framework that progressively extends recurrence from deeper to shallower layers via targeted attention looping on informative heads. This mechanism induces structural sparsity by selectively increasing depth only for a small subset of parameters as training evolves.Extensive experiments across multiple parameter scales demonstrate that SGT consistently outperforms training-time static block-level looping baselines under comparable settings, while reducing the additional training FLOPs overhead from approximately 16–20% to only 1–3% relative to a standard Transformer backbone.
2025
Revealing and Mitigating the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing
Wenyuan Zhang | Shuaiyi Nie | Jiawei Sheng | Zefeng Zhang | Xinghua Zhang | Yongquan He | Tingwen Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Wenyuan Zhang | Shuaiyi Nie | Jiawei Sheng | Zefeng Zhang | Xinghua Zhang | Yongquan He | Tingwen Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language model (LLM) role-playing has gained widespread attention. Authentic character knowledge is crucial for constructing realistic LLM role-playing agents. However, existing works usually overlook the exploration of LLMs’ ability to detect characters’ known knowledge errors (KKE) and unknown knowledge errors (UKE) while playing roles, which would lead to low-quality automatic construction of character trainable corpus. In this paper, we propose RoleKE-Bench to evaluate LLMs’ ability to detect errors in KKE and UKE. The results indicate that even the latest LLMs struggle to detect these two types of errors effectively, especially when it comes to familiar knowledge. We experimented with various reasoning strategies and propose an agent-based reasoning method, Self-Recollection and Self-Doubt (S2RD), to explore further the potential for improving error detection capabilities.
2024
Optimal Transport Guided Correlation Assignment for Multimodal Entity Linking
Zefeng Zhang | Jiawei Sheng | Chuang Zhang | Yunzhi Liang | Wenyuan Zhang | Siqi Wang | Tingwen Liu
Findings of the Association for Computational Linguistics: ACL 2024
Zefeng Zhang | Jiawei Sheng | Chuang Zhang | Yunzhi Liang | Wenyuan Zhang | Siqi Wang | Tingwen Liu
Findings of the Association for Computational Linguistics: ACL 2024
Multimodal entity linking (MEL) aims to link ambiguous mentions in multimodal contexts to entities in a multimodal knowledge graph. A pivotal challenge is to fully leverage multi-element correlations between mentions and entities to bridge modality gap and enable fine-grained semantic matching. Existing methods attempt several local correlative mechanisms, relying heavily on the automatically learned attention weights, which may over-concentrate on partial correlations. To mitigate this issue, we formulate the correlation assignment problem as an optimal transport (OT) problem, and propose a novel MEL framework, namely OT-MEL, with OT-guided correlation assignment. Thereby, we exploit the correlation between multimodal features to enhance multimodal fusion, and the correlation between mentions and entities to enhance fine-grained matching. To accelerate model prediction, we further leverage knowledge distillation to transfer OT assignment knowledge to attention mechanism. Experimental results show that our model significantly outperforms previous state-of-the-art baselines and confirm the effectiveness of the OT-guided correlation assignment.