Ke Shi
2026
PACE: Prefix-Protected and Difficulty-Aware Compression for Efficient Reasoning
Ruixiang Feng | Yuntao Wen | Silin Zhou | Ke Shi | Yifan Wang | Ran Le | Zhenwei An | Zongchao Chen | Chen Yang | Guangyue Peng | Yiming Jia | Dongsheng Wang | Tao Zhang | Lisi Chen | Yang Song | Shen Gao | Shuo Shang
Findings of the Association for Computational Linguistics: ACL 2026
Ruixiang Feng | Yuntao Wen | Silin Zhou | Ke Shi | Yifan Wang | Ran Le | Zhenwei An | Zongchao Chen | Chen Yang | Guangyue Peng | Yiming Jia | Dongsheng Wang | Tao Zhang | Lisi Chen | Yang Song | Shen Gao | Shuo Shang
Findings of the Association for Computational Linguistics: ACL 2026
Language Reasoning Models (LRMs) achieve strong performance by scaling test-time computation but often suffer from "overthinking", producing excessively long reasoning traces that increase latency and memory usage. Existing LRMs typically enforce conciseness with uniform length penalties, which over-compress crucial early deduction steps at the sequence level and indiscriminately penalize all queries at the group level. To solve these limitations, we propose PACE, a dual-level framework for prefix-protected and difficulty-aware compression under hierarchical supervision. At the sequence level, prefix-protected optimization employs decaying mixed rollouts to maintain valid reasoning paths while promoting conciseness. At the group level, difficulty-aware penalty dynamically scales length constraints based on query complexity, maintaining exploration for harder questions while curbing redundancy on easier ones. Extensive experiments on DeepSeek-R1-Distill-Qwen (1.5B/7B) demonstrate that PACE achieves a substantial reduction in token usage (up to 55.7%) while simultaneously improving accuracy (up to 4.1%) on math benchmarks, with generalization ability to code, science, and general domains.
DPEPO: Diverse Parallel Exploration Policy Optimization for LLM-based Agents
JunShuo Zhang | Chengrui Huang | Feng Guo | Zihan Li | Ke Shi | Menghua Jiang | Jiguo Yu | Shuo Shang | Shen Gao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
JunShuo Zhang | Chengrui Huang | Feng Guo | Zihan Li | Ke Shi | Menghua Jiang | Jiguo Yu | Shuo Shang | Shen Gao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language model (LLM) agents that follow the sequential “reason-then-act” paradigm have achieved superior performance in many complex tasks. However, these methods suffer from limited exploration and incomplete environmental understanding, as they interact with only a single environment per step. In this paper, we first introduce a novel paradigm that enables an agent to interact with multiple environments simultaneously and share cross-trajectory experiences. Build upon this paradigm, we further propose Diverse Parallel Exploration Policy Optimization (DPEPO), a reinforcement learning (RL) algorithm that encourages the agent to perform diverse parallel exploration. There are two stages in DPEPO: initial supervised fine-tuning (SFT) imparts basic parallel reasoning and action generation, followed by reinforcement learning stage with a hierarchical reward scheme. We design a parallel trajectory-level success reward and two step-level rewards: Diverse Action Reward and Diverse State Transition Reward, which actively penalize behavioral redundancy and promote broad exploration. Extensive experiments on ALFWorld and ScienceWorld show that DPEPO achieves state-of-the-art (SOTA) success rates, while maintaining comparable efficiency to strong sequential baselines.
2024
TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models
Chen Zhang | Chengguang Tang | Dading Chong | Ke Shi | Guohua Tang | Feng Jiang | Haizhou Li
Findings of the Association for Computational Linguistics: EMNLP 2024
Chen Zhang | Chengguang Tang | Dading Chong | Ke Shi | Guohua Tang | Feng Jiang | Haizhou Li
Findings of the Association for Computational Linguistics: EMNLP 2024
Mainstream approaches to aligning large language models (LLMs) heavily rely on human preference data, particularly when models require periodic updates. The standard process for iterative alignment of LLMs involves collecting new human feedback for each update. However, the data collection process is costly and challenging to scale. To address this issue, we introduce the “TS-Align” framework, which fine-tunes a policy model using pairwise feedback data automatically mined from its outputs. This automatic mining process is efficiently accomplished through the collaboration between a large-scale teacher model and a small-scale student model. The policy fine-tuning process can be iteratively repeated using on-policy generations within our proposed teacher-student collaborative framework. Through extensive experiments, we demonstrate that our final aligned policy outperforms the base policy model with an average win rate of 69.7% across seven conversational or instruction-following datasets. Furthermore, we show that the ranking capability of the teacher is effectively distilled into the student through our pipeline, resulting in a small-scale yet effective reward model for policy model alignment.
2023
xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark
Chen Zhang | Luis D’Haro | Chengguang Tang | Ke Shi | Guohua Tang | Haizhou Li
Findings of the Association for Computational Linguistics: EMNLP 2023
Chen Zhang | Luis D’Haro | Chengguang Tang | Ke Shi | Guohua Tang | Haizhou Li
Findings of the Association for Computational Linguistics: EMNLP 2023
Recent advancements in reference-free learned metrics for open-domain dialogue evaluation have been driven by the progress in pre-trained language models and the availability of dialogue data with high-quality human annotations. However, current studies predominantly concentrate on English dialogues, and the generalization of these metrics to other languages has not been fully examined. This is largely due to the absence of a multilingual dialogue evaluation benchmark. To address the issue, we introduce xDial-Eval, built on top of open-source English dialogue evaluation datasets. xDial-Eval includes 12 turn-level and 6 dialogue-level English datasets, comprising 14930 annotated turns and 8691 annotated dialogues respectively. The English dialogue data are extended to nine other languages with commercial machine translation systems. On xDial-Eval, we conduct comprehensive analyses of previous BERT-based metrics and the recently-emerged large language models. Lastly, we establish strong self-supervised and multilingual baselines. In terms of average Pearson correlations over all datasets and languages, the best baseline outperforms OpenAI’s ChatGPT by absolute improvements of 6.5% and 4.6% at the turn and dialogue levels respectively, albeit with much fewer parameters. The data and code are publicly available at https://github.com/e0397123/xDial-Eval.
Overview of Robust and Multilingual Automatic Evaluation Metricsfor Open-Domain Dialogue Systems at DSTC 11 Track 4
Mario Rodríguez-Cantelar | Chen Zhang | Chengguang Tang | Ke Shi | Sarik Ghazarian | João Sedoc | Luis Fernando D’Haro | Alexander I. Rudnicky
Proceedings of the Eleventh Dialog System Technology Challenge
Mario Rodríguez-Cantelar | Chen Zhang | Chengguang Tang | Ke Shi | Sarik Ghazarian | João Sedoc | Luis Fernando D’Haro | Alexander I. Rudnicky
Proceedings of the Eleventh Dialog System Technology Challenge
The advent and fast development of neural networks have revolutionized the research on dialogue systems and subsequently have triggered various challenges regarding their automatic evaluation. Automatic evaluation of open-domain dialogue systems as an open challenge has been the center of the attention of many researchers. Despite the consistent efforts to improve automatic metrics’ correlations with human evaluation, there have been very few attempts to assess their robustness over multiple domains and dimensions. Also, their focus is mainly on the English language. All of these challenges prompt the development of automatic evaluation metrics that are reliable in various domains, dimensions, and languages. This track in the 11th Dialogue System Technology Challenge (DSTC11) is part of the ongoing effort to promote robust and multilingual automatic evaluation metrics. This article describes the datasets and baselines provided to participants and discusses the submission and result details of the two proposed subtasks.
2021
DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing
Zhengyuan Liu | Ke Shi | Nancy Chen
Proceedings of the 2nd Workshop on Computational Approaches to Discourse
Zhengyuan Liu | Ke Shi | Nancy Chen
Proceedings of the 2nd Workshop on Computational Approaches to Discourse
Text discourse parsing weighs importantly in understanding information flow and argumentative structure in natural language, making it beneficial for downstream tasks. While previous work significantly improves the performance of RST discourse parsing, they are not readily applicable to practical use cases: (1) EDU segmentation is not integrated into most existing tree parsing frameworks, thus it is not straightforward to apply such models on newly-coming data. (2) Most parsers cannot be used in multilingual scenarios, because they are developed only in English. (3) Parsers trained from single-domain treebanks do not generalize well on out-of-domain inputs. In this work, we propose a document-level multilingual RST discourse parsing framework, which conducts EDU segmentation and discourse tree parsing jointly. Moreover, we propose a cross-translation augmentation strategy to enable the framework to support multilingual parsing and improve its domain generality. Experimental results show that our model achieves state-of-the-art performance on document-level multilingual RST parsing in all sub-tasks.
Coreference-Aware Dialogue Summarization
Zhengyuan Liu | Ke Shi | Nancy Chen
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue
Zhengyuan Liu | Ke Shi | Nancy Chen
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue
Summarizing conversations via neural approaches has been gaining research traction lately, yet it is still challenging to obtain practical solutions. Examples of such challenges include unstructured information exchange in dialogues, informal interactions between speakers, and dynamic role changes of speakers as the dialogue evolves. Many of such challenges result in complex coreference links. Therefore, in this work, we investigate different approaches to explicitly incorporate coreference information in neural abstractive dialogue summarization models to tackle the aforementioned challenges. Experimental results show that the proposed approaches achieve state-of-the-art performance, implying it is useful to utilize coreference information in dialogue summarization. Evaluation results on factual correctness suggest such coreference-aware models are better at tracing the information flow among interlocutors and associating accurate status/actions with the corresponding interlocutors and person mentions.
2020
Conditional Neural Generation using Sub-Aspect Functions for Extractive News Summarization
Zhengyuan Liu | Ke Shi | Nancy Chen
Findings of the Association for Computational Linguistics: EMNLP 2020
Zhengyuan Liu | Ke Shi | Nancy Chen
Findings of the Association for Computational Linguistics: EMNLP 2020
Much progress has been made in text summarization, fueled by neural architectures using large-scale training corpora. However, in the news domain, neural models easily overfit by leveraging position-related features due to the prevalence of the inverted pyramid writing style. In addition, there is an unmet need to generate a variety of summaries for different users. In this paper, we propose a neural framework that can flexibly control summary generation by introducing a set of sub-aspect functions (i.e. importance, diversity, position). These sub-aspect functions are regulated by a set of control codes to decide which sub-aspect to focus on during summary generation. We demonstrate that extracted summaries with minimal position bias is comparable with those generated by standard models that take advantage of position preference. We also show that news summaries generated with a focus on diversity can be more preferred by human raters. These results suggest that a more flexible neural summarization framework providing more control options could be desirable in tailoring to different user preferences, which is useful since it is often impractical to articulate such preferences for different applications a priori.
Multilingual Neural RST Discourse Parsing
Zhengyuan Liu | Ke Shi | Nancy Chen
Proceedings of the 28th International Conference on Computational Linguistics
Zhengyuan Liu | Ke Shi | Nancy Chen
Proceedings of the 28th International Conference on Computational Linguistics
Text discourse parsing plays an important role in understanding information flow and argumentative structure in natural language. Previous research under the Rhetorical Structure Theory (RST) has mostly focused on inducing and evaluating models from the English treebank. However, the parsing tasks for other languages such as German, Dutch, and Portuguese are still challenging due to the shortage of annotated data. In this work, we investigate two approaches to establish a neural, cross-lingual discourse parser via: (1) utilizing multilingual vector representations; and (2) adopting segment-level translation of the source content. Experiment results show that both methods are effective even with limited training data, and achieve state-of-the-art performance on cross-lingual, document-level discourse parsing on all sub-tasks.
Search
Fix author
Co-authors
- Nancy Chen 4
- Zhengyuan Liu 4
- Chengguang Tang 3
- Shen Gao 2
- Haizhou Li 2
- Shuo Shang 2
- Guohua Tang 2
- Chen Zhang 2
- Zhenwei An 1
- Zongchao Chen 1
- Lisi Chen 1
- Dading Chong 1
- Luis D’Haro 1
- Luis Fernando D’Haro 1
- Ruixiang Feng 1
- Sarik Ghazarian 1
- Feng Guo 1
- Chengrui Huang 1
- Yiming Jia 1
- Menghua Jiang 1
- Feng Jiang (蒋峰) 1
- Ran Le 1
- Zihan Li 1
- Guangyue Peng 1
- Mario Rodríguez-Cantelar 1
- Alexander Rudnicky 1
- João Sedoc 1
- Yang Song 1
- Yifan Wang 1
- Dongsheng Wang 1
- Yuntao Wen 1
- Chen Yang 1
- Jiguo Yu 1
- Tao Zhang 1
- JunShuo Zhang 1
- Chen Zhang 1
- Silin Zhou 1