Zheyong Xie

2025

The rapid advancement of large language models (LLMs) has unlocked transformative potential for role-playing emotional companion products, enabling systems that support emotional well-being, educational development, and therapeutic applications. However, existing approaches often lack sustained personalization and contextual adaptability, limiting their effectiveness in real-world settings. In this paper, we introduce iPET, an LLM-powered virtual pet agent designed to enhance user engagement through rich, dynamic pet behaviors and interactions tailored to individual preferences. iPET comprises three core components: a dialogue module that instantiates virtual pet agents for emotionally interactive conversations; a memory module that stores and synthesizes records of both agent and user experiences; and a world simulation module that generates diverse, preference-driven pet behaviors guided by high-level reflections. Deployed for over 200 days in a real-world, non-commercial product, iPET has served millions of users – providing emotional support to psychologically distressed individuals and demonstrating its effectiveness in practical applications.

2024

Despite the significant progress of large language models (LLMs) in various tasks, they often produce factual errors due to their limited internal knowledge. Retrieval-Augmented Generation (RAG), which enhances LLMs with external knowledge sources, offers a promising solution. However, these methods can be misled by irrelevant paragraphs in retrieved documents. Due to the inherent uncertainty in LLM generation, inputting the entire document may introduce off-topic information, causing the model to deviate from the central topic and affecting the relevance of the generated content. To address these issues, we propose the Retrieve-Plan-Generation (RPG) framework. RPG generates plan tokens to guide subsequent generation in the plan stage. In the answer stage, the model selects relevant fine-grained paragraphs based on the plan and uses them for further answer generation. This plan-answer process is repeated iteratively until completion, enhancing generation relevance by focusing on specific topics. To implement this framework efficiently, we utilize a simple but effective multi-task prompt-tuning method, enabling the existing LLMs to handle both planning and answering. We comprehensively compare RPG with baselines across 5 knowledge-intensive generation tasks, demonstrating the effectiveness of our approach.

With the rising popularity of Transformer-based large language models (LLMs), reducing their high inference costs has become a significant research focus. One effective approach to mitigate these costs is compressing the long input contexts. Existing methods typically leverage the self-attention mechanism of the large model itself for context compression. While these methods have achieved notable results, the compression process still entails quadratic complexity. To mitigate this limitation, we propose the In-Context Former (IC-Former). This method does not rely on the target large model but instead utilizes cross-attention mechanisms to extract and condense information from the contextual embeddings. The computational overhead of our method grows linearly with the compression range. Experimental results indicate that our method requires only 1/32 of the floating-point operations of the baseline during compression and improves processing speed by 68 to 112 times while achieving 90% of the baseline performance on evaluation metrics. Additionally, IC-Former demonstrates strong regularity in its interactions with the context, enhancing its interpretability. Overall, IC-Former significantly reduces compression costs, making real-time compression scenarios feasible.

Co-authors

Yao Hu 1

Wei Lu 1

Zhe Xu 1

Venues

Fix author