Hao Sun

Other people with similar names: Hao Sun , Hao Sun , Hao Sun


2025

pdf bib
MAIN: Mutual Alignment Is Necessary for instruction tuning
Fanyi Yang | Jianfeng Liu | Xin Zhang | Haoyu Liu | Xixin Cao | Yuefeng Zhan | Hao Sun | Weiwei Deng | Feng Sun | Qi Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Instruction tuning has empowered large language models (LLMs) to achieve remarkable performance, yet its success heavily depends on the availability of large-scale, high-quality instruction-response pairs. To meet this demand, various methods have been developed to synthesize data at scale. However, current methods for scaling up data generation often overlook a crucial aspect: the alignment between instructions and responses. We hypothesize that the quality of instruction-response pairs is determined not by the individual quality of each component, but by the degree of mutual alignment. To address this, we propose a Mutual Alignment Framework (MAIN) which enforces coherence between instructions and responses through mutual constraints. We demonstrate that MAIN generalizes well across model architectures and sizes, achieving state-of-the-art performance on LLaMA, Mistral, and Qwen models across diverse benchmarks. This work underscores the critical role of instruction-response alignment in enabling generalizable and high-quality instruction tuning for LLMs. All code is available from our repository.

pdf bib
Token-level Proximal Policy Optimization for Query Generation
Yichen Ouyang | Lu Wang | Fangkai Yang | Pu Zhao | Chenghua Huang | Jianfeng Liu | Bochen Pang | Yaming Yang | Yuefeng Zhan | Hao Sun | Qingwei Lin | Saravan Rajmohan | Weiwei Deng | Dongmei Zhang | Feng Sun
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Query generation is a critical task for web search engines (e.g. Google, Bing) and recommendation systems. Recently, state-of-the-art query generation methods leverage Large Language Models (LLMs) for their strong capabilities in context understanding and text generation. However, they still face challenges in generating high-quality queries in terms of inferring user intent based on their web search interaction history. In this paper, we propose Token-level Proximal Policy Optimization (TPPO), a noval approach designed to empower LLMs perform better in query generation through fine-tuning. TPPO is based on the Reinforcement Learning from AI Feedback (RLAIF) paradigm, consisting of a token-level reward model and a token-level proximal policy optimization module to address the sparse reward challenge in traditional RLAIF frameworks. We conducted experiments on both open-source dataset and an industrial dataset that was collected from a globally-used search engine, demonstrating that TPPO significantly improves the performance of query generation for LLMs and outperforms its existing competitors.

pdf bib
GeAR: Generation Augmented Retrieval
Haoyu Liu | Shaohan Huang | Jianfeng Liu | Yuefeng Zhan | Hao Sun | Weiwei Deng | Feng Sun | Furu Wei | Qi Zhang
Findings of the Association for Computational Linguistics: ACL 2025

Document retrieval techniques are essential for developing large-scale information systems. The common approach involves using a bi-encoder to compute the semantic similarity between a query and documents. However, the scalar similarity often fail to reflect enough information, hindering the interpretation of retrieval results. In addition, this process primarily focuses on global semantics, overlooking the finer-grained semantic relationships between the query and the document’s content. In this paper, we introduce a novel method, Generation Augmented Retrieval (GeAR), which not only improves the global document-query similarity through contrastive learning, but also integrates well-designed fusion and decoding modules. This enables GeAR to generate relevant context within the documents based on a given query, facilitating learning to retrieve local fine-grained information.Furthermore, when used as a retriever, GeAR does not incur any additional computational cost over bi-encoders. GeAR exhibits competitive retrieval performance across diverse scenarios and tasks. Moreover, qualitative analysis and the results generated by GeAR provide novel insights into the interpretation of retrieval results. The code, data, and models will be released at https://github.com/microsoft/LMOps.

pdf bib
Alleviating Performance Degradation Caused by Out-of-Distribution Issues in Embedding-Based Retrieval
Haotong Bao | Jianjin Zhang | Qi Chen | Weihao Han | Zhengxin Zeng | Ruiheng Chang | Mingzheng Li | Hao Sun | Weiwei Deng | Feng Sun | Qi Zhang
Findings of the Association for Computational Linguistics: EMNLP 2025

In Embedding Based Retrieval (EBR), Approximate Nearest Neighbor (ANN) algorithms are widely adopted for efficient large-scale search. However, recent studies reveal a query out-of-distribution (OOD) issue, where query and base embeddings follow mismatched distributions, significantly degrading ANN performance. In this work, we empirically verify the generality of this phenomenon and provide a quantitative analysis. To mitigate the distributional gap, we introduce a distribution regularizer into the encoder training objective, encouraging alignment between query and base embeddings. Extensive experiments across multiple datasets, encoders, and ANN indices show that our method consistently improves retrieval performance.