Fanyu Meng
2026
Thinking-Based Non-Thinking: Solving the Reward Hacking Problem in Training Hybrid Reasoning Models via Reinforcement Learning
Siyuan Gan | Jiaheng Liu | Boyan Wang | Tianpei Yang | Runqing Miao | Yuyao Zhang | Fanyu Meng | Junlan Feng | Linjian Meng | Jing Huo | Yang Gao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Siyuan Gan | Jiaheng Liu | Boyan Wang | Tianpei Yang | Runqing Miao | Yuyao Zhang | Fanyu Meng | Junlan Feng | Linjian Meng | Jing Huo | Yang Gao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large reasoning models (LRMs) have attracted much attention due to their exceptional performance. However, their performance mainly stems from thinking, a long Chain of Thought (CoT), which significantly increase computational overhead. To address this overthinking problem, existing work focuses on using reinforcement learning (RL) to train hybrid reasoning models that automatically decide whether to engage in thinking or not based on the complexity of the query. Unfortunately, using RL will suffer the the reward hacking problem, e.g., the model engages in thinking but is judged as not doing so, resulting in incorrect rewards.To mitigate this problem, existing works either employ supervised fine-tuning (SFT), which incurs high computational costs, or enforce uniform token limits on non-thinking responses, which yields limited mitigation of the problem.In this paper, we propose Thinking-Based Non-Thinking (TNT). It does not employ SFT, and sets different maximum token usage for responses not using thinking across various queries by leveraging information from the solution component of the responses using thinking. Experiments on five mathematical benchmarks demonstrate that TNT reduces token usage by around 50\\%$ compared to DeepSeek-R1-Distill-Qwen-1.5B/7B and DeepScaleR-1.5B, while significantly improving accuracy. In fact, TNT achieves the optimal trade-off between accuracy and efficiency among all tested methods. Additionally, the probability of reward hacking problem in TNT’s responses, which are classified as not using thinking, remains below $10\\%$ across all tested datasets.
2025
PD3F: A Pluggable and Dynamic DoS-Defense Framework against resource consumption attacks targeting Large Language Models
Yuanhe Zhang | Xinyue Wang | Haoran Gao | Zhenhong Zhou | Fanyu Meng | Yuyao Zhang | Sen Su
Findings of the Association for Computational Linguistics: EMNLP 2025
Yuanhe Zhang | Xinyue Wang | Haoran Gao | Zhenhong Zhou | Fanyu Meng | Yuyao Zhang | Sen Su
Findings of the Association for Computational Linguistics: EMNLP 2025
Large Language Models (LLMs), due to substantial computational requirements, are vulnerable to resource consumption attacks, which can severely degrade server performance or even cause crashes, as demonstrated by denial-of-service (DoS) attacks designed for LLMs. However, existing works lack mitigation strategies against such threats, resulting in unresolved security risks for real-world LLM deployments. To this end, we propose the Pluggable and Dynamic DoS-Defense Framework (PD3F), which employs a two-stage approach to defend against resource consumption attacks from both the input and output sides. On the input side, we propose the Resource Index to guide Dynamic Request Polling Scheduling, thereby reducing computing resource usage induced by malicious prompts under high-concurrency scenarios. On the output side, we introduce the Adaptive End-Based Suppression mechanism, which reduces excessive malicious generation. Experiments across six models demonstrate that PD3F significantly mitigates resource consumption attacks, improving users’ access capacity by up to 500% during adversarial load. PD3F represents a step toward the resilient and resource-aware deployment of LLMs against resource consumption attacks.
2020
A structure-enhanced graph convolutional network for sentiment analysis
Fanyu Meng | Junlan Feng | Danping Yin | Si Chen | Min Hu
Findings of the Association for Computational Linguistics: EMNLP 2020
Fanyu Meng | Junlan Feng | Danping Yin | Si Chen | Min Hu
Findings of the Association for Computational Linguistics: EMNLP 2020
Syntactic information is essential for both sentiment analysis(SA) and aspect-based sentiment analysis(ABSA). Previous work has already achieved great progress utilizing Graph Convolutional Network(GCN) over dependency tree of a sentence. However, these models do not fully exploit the syntactic information obtained from dependency parsing such as the diversified types of dependency relations. The message passing process of GCN should be distinguished based on these syntactic information. To tackle this problem, we design a novel weighted graph convolutional network(WGCN) which can exploit rich syntactic information based on the feature combination. Furthermore, we utilize BERT instead of Bi-LSTM to generate contextualized representations as inputs for GCN and present an alignment method to keep word-level dependencies consistent with wordpiece unit of BERT. With our proposal, we are able to improve the state-of-the-art on four ABSA tasks out of six and two SA tasks out of three.
Adversarial Semantic Decoupling for Recognizing Open-Vocabulary Slots
Yuanmeng Yan | Keqing He | Hong Xu | Sihong Liu | Fanyu Meng | Min Hu | Weiran Xu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Yuanmeng Yan | Keqing He | Hong Xu | Sihong Liu | Fanyu Meng | Min Hu | Weiran Xu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Open-vocabulary slots, such as file name, album name, or schedule title, significantly degrade the performance of neural-based slot filling models since these slots can take on values from a virtually unlimited set and have no semantic restriction nor a length limit. In this paper, we propose a robust adversarial model-agnostic slot filling method that explicitly decouples local semantics inherent in open-vocabulary slot words from the global context. We aim to depart entangled contextual semantics and focus more on the holistic context at the level of the whole sentence. Experiments on two public datasets show that our method consistently outperforms other methods with a statistically significant margin on all the open-vocabulary slots without deteriorating the performance of normal slots.