Cheng Deng
Other people with similar names: Cheng Deng
Unverified author pages with similar names: Cheng Deng
2026
Logical Structure as Knowledge: Enhancing LLM Reasoning via Structured Logical Knowledge Density Estimation
Zhen Bi | Zhenlin Hu | Xueshu Chen | Mingyang Chen | Cheng Deng | Yida Xue | Zhen Wang | Qing Shen | Ningyu Zhang | Jungang Lou
Findings of the Association for Computational Linguistics: ACL 2026
Zhen Bi | Zhenlin Hu | Xueshu Chen | Mingyang Chen | Cheng Deng | Yida Xue | Zhen Wang | Qing Shen | Ningyu Zhang | Jungang Lou
Findings of the Association for Computational Linguistics: ACL 2026
The reasoning capabilities of Large Language Models (LLMs) are increasingly attributed to training data quality rather than mere parameter scaling. However, existing data-centric paradigms often equate quality with factuality or diversity and ignore the internal logical complexity of training samples. In this work, we propose that natural language harbors Structured Logical Knowledge manifested through entailment relationships and logical topologies. To quantify this, we introduce Structured Logical Knowledge Density (SLKD), a novel metric that measures logical information content by decomposing natural language into executable predicates and logical primitives. Our analysis reveals a significant logical disparity in current datasets where sparse logical signals predominate. Consequently, we propose a density-aware re-cognizing optimization strategy that prioritizes high-density logical samples to align training with the model’s reasoning boundary. Extensive experiments demonstrate that our approach enhances reasoning performance and generalization without increasing total data volume. These results, further validated within a reinforcement learning framework, suggest that elevating logical density is more critical than expanding data scale for realizing the full cognitive potential of LLMs. The anonymized code is available in the Appendix C.
GR1: Reinforcement-Enhanced LLM for Geoscience Reasoning
Yule Xie | Jiaxin Ding | Cheng Deng | Shiqing Gao | Junran Zhang | Sibo Zhang | Zeyuan Wang | Ke Wu | Xin Ding | Luoyi Fu | Meng Jin | Xinbing Wang
Findings of the Association for Computational Linguistics: ACL 2026
Yule Xie | Jiaxin Ding | Cheng Deng | Shiqing Gao | Junran Zhang | Sibo Zhang | Zeyuan Wang | Ke Wu | Xin Ding | Luoyi Fu | Meng Jin | Xinbing Wang
Findings of the Association for Computational Linguistics: ACL 2026
Reinforcement learning (RL) has recently shown remarkable ability to enhance reasoning in large language models (LLMs), yet its potential in scientific domains beyond mathematics remains largely unexplored. Geoscience questions couple broad factual knowledge with multi-step inference and often rely on visual evidence such as maps, cross-sections, and diagrams, making them a challenging but verifiable testbed for RL-based reasoning. To enable this study, we introduce GeoMC-10K, a dataset of 10,000 geoscience multiple-choice questions spanning physical to human geography and high-school to professional levels; over 30% of the questions are image dependent. To support text-only RL on these multimodal questions, we design GeoM2T, a multi-agent framework that converts multimodal questions into descriptive text while preserving answerability and difficulty. Fine-tuning LLaMA-3.1-8B and Qwen-3-8B with Group Relative Policy Optimization (GRPO), incorporating a factual reward mechanism, yields GR1, which achieves absolute accuracy improvements of 5.9% and 13.3%, respectively, and it generalizes to out-of-distribution geoscience benchmarks. Together, GeoMC-10K, GeoM2T, and GR1 establish a scalable benchmark and baseline for RL-enhanced geoscience reasoning.
VisPCO: Visual Token Pruning Configuration Optimization via Budget-Aware Pareto-Frontier Learning for Vision-Language Models
Huawei Ji | Yuanhao Sun | Yuan Jin | Cheng Deng | Jiaxin Ding | Luoyi Fu | Xinbing Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Huawei Ji | Yuanhao Sun | Yuan Jin | Cheng Deng | Jiaxin Ding | Luoyi Fu | Xinbing Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Visual token pruning methods effectively mitigate the quadratic computational growth caused by processing high-resolution images and video frames in vision-language models (VLMs). However, existing approaches rely on predefined pruning configurations without determining whether they achieve computation-performance optimality. In this work, we introduce , a novel framework that formulates visual token pruning as a Pareto configuration optimization problem to automatically identify optimal configurations. Our approach employs continuous relaxation and straight-through estimators to enable gradient-based search, solved via the Augmented Lagrangian method. Extensive experiments across 8 visual benchmarks demonstrate that effectively approximates the empirical Pareto frontier obtained through grid search and generalizes well across various pruning methods and VLM architectures. Furthermore, through learnable kernel functions, we investigate layer-wise pruning patterns and reveal that multi-step progressive pruning captures VLMs’ hierarchical compression structure, achieving superior accuracy-efficiency trade-offs compared to single-layer approaches.
Dual Activation-Weight Sparsity: A Training-Free Framework for Efficient Large Language Model Compression
Luoyang Sun | Guangyan Li | Cheng Deng | Haifeng Zhang | Jian Zhao | Yongqiang Tang | Wensheng Zhang | Jun Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Luoyang Sun | Guangyan Li | Cheng Deng | Haifeng Zhang | Jian Zhao | Yongqiang Tang | Wensheng Zhang | Jun Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) excel at natural language tasks but face deployment challenges due to computational demands. We introduce Dual Activation-Weight Sparsity (DAWS), a training-free framework that jointly exploits activation and weight sparsity through magnitude-based routing. Systematic analysis of pretrained transformers reveals two key observations: (1) the activation energy is concentrated in a few neurons, and (2) activation and weight sparsity patterns are complementary between attention and FFN layers. DAWS employs a three-tier routing strategy: high-magnitude activations pass through full-precision weights to preserve critical pathways, medium-magnitude activations use magnitude-pruned sparse weights for efficiency, and low-magnitude activations are directly discarded. Unlike prior work that uses activation-aware pruning methods like WANDA, our approach uses direct magnitude-based pruning, which we show is more robust to sample-level variations. Experiments on Llama and Mistral models demonstrate that DAWS maintains >98% of dense model performance at 50% sparsity, outperforming WANDA, TEAL, and R-Sparse.
Search
Fix author
Co-authors
- Jiaxin Ding 2
- Luoyi Fu 2
- Xinbing Wang 2
- Zhen Bi 1
- Mingyang Chen 1
- Xueshu Chen 1
- Xin Ding 1
- Shiqing Gao 1
- Zhenlin Hu 1
- Huawei Ji 1
- Meng Jin 1
- Yuan Jin 1
- Guangyan Li 1
- Jungang Lou 1
- Qing Shen 1
- Luoyang Sun 1
- Yuanhao Sun 1
- Yongqiang Tang 1
- Jun Wang 1
- Zeyuan Wang 1
- Zhen Wang 1
- Ke Wu 1
- Yule Xie 1
- Yida Xue 1
- Haifeng Zhang 1
- Junran Zhang 1
- Ningyu Zhang 1
- Sibo Zhang 1
- Wensheng Zhang 1
- Jian Zhao 1