Yifei Zhang

Other people with similar names: Yifei Zhang

Unverified author pages with similar names: Yifei Zhang


Fixing paper assignments

  1. Please select all papers that do not belong to this person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
From General Reward to Targeted Reward: Improving Open-ended Long-context Generation Models
Zhihan Guo | Jiele Wu | Wenqian Cui | Yifei Zhang | Minda Hu | Yufei Wang | Irwin King
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Current research on long-form context in Large Language Models (LLMs) primarily focuses on the understanding of long-contexts, the **Open-ended Long Text Generation** (Open-LTG) remains insufficiently explored. Training a long text generation model requires curation of gold-standard reference data, which is typically nonexistent for informative Open-LTG tasks. However, previous methods only utilize general assessments as reward signals, which limits accuracy. To bridge this gap, we introduce **ProxyReward**, an innovative reinforcement learning (RL) based framework, which includes a data synthesis method and a novel reward signal. Firstly, **ProxyReward Dataset** synthesis is accomplished through simple prompts that enables the model to create automatically, obviating extensive labeled data or significant manual effort. Secondly, **ProxyReward Signal** offers a targeted evaluation of information comprehensiveness and accuracy for specific questions. The experimental results indicate that our method ProxyReward **surpasses even GPT-4-Turbo**. It can significantly enhance performance by 20% on the Open-LTG task when training widely used open-source models, while also surpassing the LLM-as-a-Judge approach. Our work presents effective methods to enhance the ability of LLMs to address complex open-ended questions posed by humans.