Yuefeng Zhan
2026
DUET: Joint Exploration of User–Item Profiles in Recommendation System
Yue Chen | Yifei Sun | Lu Wang | Fangkai Yang | Pu Zhao | Minjie Hong | Yifei Dong | Minghua He | Nan Hu | Jianjin Zhang | Zhiwei Dai | Yuefeng Zhan | Weihao Han | Hao Sun | Qingwei Lin | Weiwei Deng | Feng Sun | Qi Zhang | Saravan Rajmohan | Dongmei Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Yue Chen | Yifei Sun | Lu Wang | Fangkai Yang | Pu Zhao | Minjie Hong | Yifei Dong | Minghua He | Nan Hu | Jianjin Zhang | Zhiwei Dai | Yuefeng Zhan | Weihao Han | Hao Sun | Qingwei Lin | Weiwei Deng | Feng Sun | Qi Zhang | Saravan Rajmohan | Dongmei Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Traditional recommendation systems represent users and items as dense vectors and learn to align them in a shared latent space for relevance estimation. Recent LLM-based recommenders instead leverage natural-language representations that are easier to interpret and integrate with downstream reasoning modules. This paper studies how to construct effective textual profiles for users and items, and how to align them for recommendation.A central difficulty is that the best profile format is not known a priori: manually designed templates can be brittle and misaligned with task objectives. Moreover, generating user and item profiles independently may produce descriptions that are individually plausible yet semantically inconsistent for a specific user–item pair. We propose Duet, an interaction-aware profile generator that jointly produces user and item profiles conditioned on both user history and item evidence. Duet follows a three-stage procedure: it first turns raw histories and metadata into compact cues, then expands these cues into paired profile prompts and then generate profiles, and finally optimizes the generation policy with reinforcement learning using downstream recommendation performance as feedback. Experiments on three real-world datasets show that Duet consistently outperforms strong baselines, demonstrating the benefits of template-free profile exploration and joint user–item textual alignment. Project page: https://duet-rec.github.io/.
2025
MAIN: Mutual Alignment Is Necessary for instruction tuning
Fanyi Yang | Jianfeng Liu | Xin Zhang | Haoyu Liu | Xixin Cao | Yuefeng Zhan | Hao Sun | Weiwei Deng | Feng Sun | Qi Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Fanyi Yang | Jianfeng Liu | Xin Zhang | Haoyu Liu | Xixin Cao | Yuefeng Zhan | Hao Sun | Weiwei Deng | Feng Sun | Qi Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Instruction tuning has empowered large language models (LLMs) to achieve remarkable performance, yet its success heavily depends on the availability of large-scale, high-quality instruction-response pairs. To meet this demand, various methods have been developed to synthesize data at scale. However, current methods for scaling up data generation often overlook a crucial aspect: the alignment between instructions and responses. We hypothesize that the quality of instruction-response pairs is determined not by the individual quality of each component, but by the degree of mutual alignment. To address this, we propose a Mutual Alignment Framework (MAIN) which enforces coherence between instructions and responses through mutual constraints. We demonstrate that MAIN generalizes well across model architectures and sizes, achieving state-of-the-art performance on LLaMA, Mistral, and Qwen models across diverse benchmarks. This work underscores the critical role of instruction-response alignment in enabling generalizable and high-quality instruction tuning for LLMs. All code is available from our repository.
Token-level Proximal Policy Optimization for Query Generation
Yichen Ouyang | Lu Wang | Fangkai Yang | Pu Zhao | Chenghua Huang | Jianfeng Liu | Bochen Pang | Yaming Yang | Yuefeng Zhan | Hao Sun | Qingwei Lin | Saravan Rajmohan | Weiwei Deng | Dongmei Zhang | Feng Sun
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yichen Ouyang | Lu Wang | Fangkai Yang | Pu Zhao | Chenghua Huang | Jianfeng Liu | Bochen Pang | Yaming Yang | Yuefeng Zhan | Hao Sun | Qingwei Lin | Saravan Rajmohan | Weiwei Deng | Dongmei Zhang | Feng Sun
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Query generation is a critical task for web search engines (e.g. Google, Bing) and recommendation systems. Recently, state-of-the-art query generation methods leverage Large Language Models (LLMs) for their strong capabilities in context understanding and text generation. However, they still face challenges in generating high-quality queries in terms of inferring user intent based on their web search interaction history. In this paper, we propose Token-level Proximal Policy Optimization (TPPO), a noval approach designed to empower LLMs perform better in query generation through fine-tuning. TPPO is based on the Reinforcement Learning from AI Feedback (RLAIF) paradigm, consisting of a token-level reward model and a token-level proximal policy optimization module to address the sparse reward challenge in traditional RLAIF frameworks. We conducted experiments on both open-source dataset and an industrial dataset that was collected from a globally-used search engine, demonstrating that TPPO significantly improves the performance of query generation for LLMs and outperforms its existing competitors.
GeAR: Generation Augmented Retrieval
Haoyu Liu | Shaohan Huang | Jianfeng Liu | Yuefeng Zhan | Hao Sun | Weiwei Deng | Feng Sun | Furu Wei | Qi Zhang
Findings of the Association for Computational Linguistics: ACL 2025
Haoyu Liu | Shaohan Huang | Jianfeng Liu | Yuefeng Zhan | Hao Sun | Weiwei Deng | Feng Sun | Furu Wei | Qi Zhang
Findings of the Association for Computational Linguistics: ACL 2025
Document retrieval techniques are essential for developing large-scale information systems. The common approach involves using a bi-encoder to compute the semantic similarity between a query and documents. However, the scalar similarity often fail to reflect enough information, hindering the interpretation of retrieval results. In addition, this process primarily focuses on global semantics, overlooking the finer-grained semantic relationships between the query and the document’s content. In this paper, we introduce a novel method, Generation Augmented Retrieval (GeAR), which not only improves the global document-query similarity through contrastive learning, but also integrates well-designed fusion and decoding modules. This enables GeAR to generate relevant context within the documents based on a given query, facilitating learning to retrieve local fine-grained information.Furthermore, when used as a retriever, GeAR does not incur any additional computational cost over bi-encoders. GeAR exhibits competitive retrieval performance across diverse scenarios and tasks. Moreover, qualitative analysis and the results generated by GeAR provide novel insights into the interpretation of retrieval results. The code, data, and models will be released at https://github.com/microsoft/LMOps.
2024
Se2: Sequential Example Selection for In-Context Learning
Haoyu Liu | Jianfeng Liu | Shaohan Huang | Yuefeng Zhan | Hao Sun | Weiwei Deng | Furu Wei | Qi Zhang
Findings of the Association for Computational Linguistics: ACL 2024
Haoyu Liu | Jianfeng Liu | Shaohan Huang | Yuefeng Zhan | Hao Sun | Weiwei Deng | Furu Wei | Qi Zhang
Findings of the Association for Computational Linguistics: ACL 2024
The remarkable capability of large language models(LLMs) for in-context learning(ICL) needs to be activated by demonstration examples. Prior work has extensively explored the selection of examples for ICL, predominantly following the “select then organize” paradigm, such approaches often neglect the internal relationships between examples and exist an inconsistency between the training and inference. In this paper, we formulate the problem as a Sequential Selection problem and introduce Se2, a sequential-aware method that leverages the LLM’s feedback on varying context, aiding in capturing inter-relationships and sequential information among examples, significantly enriching the contextuality and relevance of ICL prompts. Meanwhile, we utilize beam search to seek and construct example sequences, enhancing both quality and diversity. Extensive experiments across 23 NLP tasks from 8 distinct categories illustrate that Se2 markedly surpasses competitive baselines and achieves 42% relative improvement over random selection. Further in-depth analysis shows the effectiveness of proposed strategies, highlighting Se2‘s exceptional stability and adaptability across various scenarios. Code available at https://github.com/microsoft/LMOps.
2023
UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation
Daixuan Cheng | Shaohan Huang | Junyu Bi | Yuefeng Zhan | Jianfeng Liu | Yujing Wang | Hao Sun | Furu Wei | Weiwei Deng | Qi Zhang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Daixuan Cheng | Shaohan Huang | Junyu Bi | Yuefeng Zhan | Jianfeng Liu | Yujing Wang | Hao Sun | Furu Wei | Weiwei Deng | Qi Zhang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Large Language Models (LLMs) are popular for their impressive abilities, but the need for model-specific fine-tuning or task-specific prompt engineering can hinder their generalization. We propose UPRISE (Universal Prompt Retrieval for Improving zero-Shot Evaluation), which tunes a lightweight and versatile retriever that automatically retrieves prompts for a given zero-shot task input. Specifically, we demonstrate universality in a cross-task and cross-model scenario: the retriever is tuned on diverse tasks, but tested on unseen task types; we use a small frozen LLM, GPT-Neo-2.7B, for tuning the retriever, but test the retriever on different LLMs of much larger scales, such as BLOOM-7.1B, OPT-66B and GPT3-175B. Additionally, we show that UPRISE mitigates the hallucination problem in our experiments with ChatGPT, suggesting its potential to improve even the strongest LLMs. Our model and code are available at https://github.com/microsoft/LMOps.
2022
Snapshot-Guided Domain Adaptation for ELECTRA
Daixuan Cheng | Shaohan Huang | Jianfeng Liu | Yuefeng Zhan | Hao Sun | Furu Wei | Denvy Deng | Qi Zhang
Findings of the Association for Computational Linguistics: EMNLP 2022
Daixuan Cheng | Shaohan Huang | Jianfeng Liu | Yuefeng Zhan | Hao Sun | Furu Wei | Denvy Deng | Qi Zhang
Findings of the Association for Computational Linguistics: EMNLP 2022
Discriminative pre-trained language models, such as ELECTRA, have achieved promising performances in a variety of general tasks. However, these generic pre-trained models struggle to capture domain-specific knowledge of domain-related tasks. In this work, we propose a novel domain-adaptation method for ELECTRA, which can dynamically select domain-specific tokens and guide the discriminator to emphasize them, without introducing new training parameters. We show that by re-weighting the losses of domain-specific tokens, ELECTRA can be effectively adapted to different domains. The experimental results in both computer science and biomedical domains show that the proposed method can achieve state-of-the-art results on the domain-related tasks.
Search
Fix author
Co-authors
- Weiwei Deng 6
- Jianfeng Liu 6
- Qi Zhang 5
- Shaohan Huang 4
- Hao Sun 4
- Feng Sun 4
- Furu Wei 4
- Haoyu Liu 3
- Hao Sun 3
- Daixuan Cheng 2
- Qingwei Lin 2
- Saravan Rajmohan 2
- Fangkai Yang 2
- Dongmei Zhang 2
- Pu Zhao 2
- Junyu Bi 1
- Xixin Cao 1
- Yue Chen 1
- Zhiwei Dai 1
- Denvy Deng 1
- Yifei Dong 1
- Weihao Han 1
- Minghua He 1
- Minjie Hong 1
- Nan Hu 1
- Chenghua Huang 1
- Yichen Ouyang 1
- Bochen Pang 1
- Yifei Sun 1
- Yujing Wang 1
- Lu Wang 1
- Lu Wang 1
- Fanyi Yang 1
- Yaming Yang 1
- Xin Zhang 1
- Jianjin Zhang 1
- Qi Zhang 1