Tian Xie


2026

Long-reasoning models achieve strong accuracy on complex reasoning tasks, but their extended reasoning trajectories incur substantial memory and latency costs. Several existing shortening methods rely on additional supervision or multi-stage post-training, which primarily reduces inference length and does not reduce the rollout tokens during on-policy reinforcement learning (RL). We instead target on-policy response shortening, aiming to improve both inference efficiency and RL training throughput. However, because on-policy RL couples optimization with exploration, naively penalizing length can destabilize training and suppress exploration. To impose length pressure safely, we propose a lazy length penalty integrated into the rule-based RL pipeline: it activates only on correct trajectories, only after training accuracy enters a stably improving regime, and only when responses exceed a tolerance band beyond the minimal correct length. Across four settings, our method significantly reduces response length without extra training stages while maintaining or improving performance. In a logic reasoning setting, we achieve a 40% reduction in step-averaged response length alongside a 14-point gain in performance. For math problems, we reduce step-averaged response length by 33% while preserving performance.

2025

Pre-trained language models (PLMs) are widely used in NLP but struggle with capturing entity knowledge. To address this, knowledge enhancement techniques have been proposed. However, existing methods rely heavily on external knowledge bases embedding and often introduce noisy entity representations. In this work, we propose a novel **K**nowledge **E**nhancement **F**iltering **F**ramework named KEFF, which contains both knowledge enhancement and knowledge enhancement filtering modules for PLM. We find that there are certain redundant bits in the embedding space of PLMs. Building on this insight, we implement knowledge-enhanced mapping of redundant bit values in entity span tokens. In order to solve the knowledge enhancement problem of existing methods that introduce noisy entity representation knowledge, we further propose a novel knowledge enhancement filter based on our knowledge enhancement method. Finally, experiments on four knowledge-driven NLP tasks show that our method effectively improves the ability of PLMs on downstream tasks. Compared to state-of-the-art approachs, our method achieves the highest F1-score and accuracy, while reducing the computational cost by 1.7-2.5x.

2023

Making big purchases requires consumers to research or consult a salesperson to gain domain expertise. However, existing conversational recommender systems (CRS) often overlook users’ lack of background knowledge, focusing solely on gathering preferences. In this work, we define a new problem space for conversational agents that aim to provide both product recommendations and educational value through mixed-type mixed-initiative dialog. We introduce SalesOps, a framework that facilitates the simulation and evaluation of such systems by leveraging recent advancements in large language models (LLMs). We build SalesBot and ShopperBot, a pair of LLM-powered agents that can simulate either side of the framework. A comprehensive human study compares SalesBot against professional salespeople, revealing that although SalesBot approaches professional performance in terms of fluency and informativeness, it lags behind in recommendation quality. We emphasize the distinct limitations both face in providing truthful information, highlighting the challenges of ensuring faithfulness in the CRS context. We release our code and make all data available.