Tian Xie
2026
Shorten After You’re Right: Lazy Length Penalties for Reasoning RL
Danlong Yuan | Tian Xie | Shaohan Huang | Huishuai Zhang | Zhuocheng Gong | Chong Luo | Furu Wei | Dongyan Zhao
Findings of the Association for Computational Linguistics: ACL 2026
Danlong Yuan | Tian Xie | Shaohan Huang | Huishuai Zhang | Zhuocheng Gong | Chong Luo | Furu Wei | Dongyan Zhao
Findings of the Association for Computational Linguistics: ACL 2026
Long-reasoning models achieve strong accuracy on complex reasoning tasks, but their extended reasoning trajectories incur substantial memory and latency costs. Several existing shortening methods rely on additional supervision or multi-stage post-training, which primarily reduces inference length and does not reduce the rollout tokens during on-policy reinforcement learning (RL). We instead target on-policy response shortening, aiming to improve both inference efficiency and RL training throughput. However, because on-policy RL couples optimization with exploration, naively penalizing length can destabilize training and suppress exploration. To impose length pressure safely, we propose a lazy length penalty integrated into the rule-based RL pipeline: it activates only on correct trajectories, only after training accuracy enters a stably improving regime, and only when responses exceed a tolerance band beyond the minimal correct length. Across four settings, our method significantly reduces response length without extra training stages while maintaining or improving performance. In a logic reasoning setting, we achieve a 40% reduction in step-averaged response length alongside a 14-point gain in performance. For math problems, we reduce step-averaged response length by 33% while preserving performance.
2025
Improving Pre-trained Language Models with Knowledge Enhancement and Filtering Framework
Qi Zhao | Qi Song | Tian Xie | Haiyue Zhang | Hongyu Yang | Xiangyang Li
Findings of the Association for Computational Linguistics: NAACL 2025
Qi Zhao | Qi Song | Tian Xie | Haiyue Zhang | Hongyu Yang | Xiangyang Li
Findings of the Association for Computational Linguistics: NAACL 2025
Pre-trained language models (PLMs) are widely used in NLP but struggle with capturing entity knowledge. To address this, knowledge enhancement techniques have been proposed. However, existing methods rely heavily on external knowledge bases embedding and often introduce noisy entity representations. In this work, we propose a novel **K**nowledge **E**nhancement **F**iltering **F**ramework named KEFF, which contains both knowledge enhancement and knowledge enhancement filtering modules for PLM. We find that there are certain redundant bits in the embedding space of PLMs. Building on this insight, we implement knowledge-enhanced mapping of redundant bit values in entity span tokens. In order to solve the knowledge enhancement problem of existing methods that introduce noisy entity representation knowledge, we further propose a novel knowledge enhancement filter based on our knowledge enhancement method. Finally, experiments on four knowledge-driven NLP tasks show that our method effectively improves the ability of PLMs on downstream tasks. Compared to state-of-the-art approachs, our method achieves the highest F1-score and accuracy, while reducing the computational cost by 1.7-2.5x.
2023
Salespeople vs SalesBot: Exploring the Role of Educational Value in Conversational Recommender Systems
Lidiya Murakhovs’ka | Philippe Laban | Tian Xie | Caiming Xiong | Chien-Sheng Wu
Findings of the Association for Computational Linguistics: EMNLP 2023
Lidiya Murakhovs’ka | Philippe Laban | Tian Xie | Caiming Xiong | Chien-Sheng Wu
Findings of the Association for Computational Linguistics: EMNLP 2023
Making big purchases requires consumers to research or consult a salesperson to gain domain expertise. However, existing conversational recommender systems (CRS) often overlook users’ lack of background knowledge, focusing solely on gathering preferences. In this work, we define a new problem space for conversational agents that aim to provide both product recommendations and educational value through mixed-type mixed-initiative dialog. We introduce SalesOps, a framework that facilitates the simulation and evaluation of such systems by leveraging recent advancements in large language models (LLMs). We build SalesBot and ShopperBot, a pair of LLM-powered agents that can simulate either side of the framework. A comprehensive human study compares SalesBot against professional salespeople, revealing that although SalesBot approaches professional performance in terms of fluency and informativeness, it lags behind in recommendation quality. We emphasize the distinct limitations both face in providing truthful information, highlighting the challenges of ensuring faithfulness in the CRS context. We release our code and make all data available.