Mingyang Sun

2026

The slow thinking paradigm has been widely validated to enhance the reasoning capabilities of Large Language Models (LLMs), but it introduces notable reasoning inefficiencies: models often overthink simple tasks while prematurely shifting their reasoning paths when addressing complex problems. To address this, we propose AdapThink, a simple yet efficient framework for adaptive reasoning preference control. Unlike methods imposing uniform length constraints, AdapThink dynamically adjusts reflection preferences based on group-level distributional statistics of reasoning length and reflection intensity. We further introduce a dispersion-based diversity sampling mechanism that maximizes the geometric spread of reasoning patterns, accelerating learning through exposure to diverse problem-solving strategies. Across mathematical reasoning and code generation benchmarks, AdapThink reduces average response length by 17.1%-21.4% while improving performance by 6.12-6.59 points under 32K token budgets, demonstrating superior efficiency and robustness against reward hacking compared to strong baselines.

2023

pdf bib abs

Value type: the bridge to a better DST model
Gao Qixiang | Mingyang Sun | Yutao Mou | Chen Zeng | Weiran Xu
Findings of the Association for Computational Linguistics: ACL 2023

Value type of the slots can provide lots of useful information for DST tasks. However, it has been ignored in most previous works. In this paper, we propose a new framework for DST task based on these value types. Firstly, we extract the type of token from each turn. Specifically, we divide the slots in the dataset into 9 categories according to the type of slot value, and then train a Ner model to extract the corresponding type-entity from each turn of conversation according to the token. Secondly, we improve the attention mode which is integrated into value type information between the slot and the conversation history to help each slot pay more attention to the turns that contain the same value type. Meanwhile, we introduce a sampling strategy to integrate these types into the attention formula, which decrease the error of Ner model. Finally, we conduct a comprehensive experiment on two multi-domain task-oriented conversation datasets, MultiWOZ 2.1 and MultiWOZ 2.4. The ablation experimental results show that our method is effective on both datasets, which verify the necessity of considering the type of slot value.

2022

pdf bib abs

Collecting dialogue data with domain-slot-value labels for dialogue state tracking (DST) could be a costly process. In this paper, we propose a novel framework based on domain-slot related description to tackle the challenge of few-shot cross-domain DST. Specifically, we design an extraction module to extract domain-slot related verbs and nouns in the dialogue. Then, we integrates them into the description, which aims to prompt the model to identify the slot information. Furthermore, we introduce a random sampling strategy to improve the domain generalization ability of the model. We utilize a pre-trained model to encode contexts and description and generates answers with an auto-regressive manner. Experimental results show that our approaches substantially outperform the existing few-shot DST methods on MultiWOZ and gain strong improvements on the slot accuracy comparing to existing slot description methods.

Co-authors

Xu Wan 1

Venues

Findings2
EMNLP1

Fix author