Zhen Chen
2026
IF-GEO: Conflict-Aware Instruction Fusion for Multi-Query Generative Engine Optimization
Heyang Zhou | Jiajia Chen | Xiaolu Chen | Jie Bao | Zhen Chen | Yong Liao
Findings of the Association for Computational Linguistics: ACL 2026
Heyang Zhou | Jiajia Chen | Xiaolu Chen | Jie Bao | Zhen Chen | Yong Liao
Findings of the Association for Computational Linguistics: ACL 2026
As Generative Engines revolutionize information retrieval by synthesizing direct answers from retrieved sources, ensuring source visibility becomes a significant challenge. Improving it through targeted content revisions is a practical strategy termed Generative Engine Optimization (GEO). However, optimizing a document for diverse queries presents a constrained optimization challenge where heterogeneous queries often impose conflicting and competing revision requirements under a limited content budget. To address this challenge, we propose IF-GEO, a "diverge-then-converge" framework comprising two phases: (i) mining distinct optimization preferences from representative latent queries; (ii) synthesizing a Global Revision Blueprint for guided editing by coordinating preferences via conflict-aware instruction fusion. To explicitly quantify IF-GEO’s objective of cross-query stability, we introduce risk-aware stability metrics. Experiments on multi-query benchmarks demonstrate that IF-GEO achieves substantial performance gains while maintaining robustness across diverse retrieval scenarios.
Domain-Specific Data Generation Framework for RAG Adaptation
Chris Xing Tian | Weihao Xie | Zhen Chen | Hui Liu | Zhengyuan Yi | Haoliang Li | Shiqi Wang | Siwei Ma
Findings of the Association for Computational Linguistics: ACL 2026
Chris Xing Tian | Weihao Xie | Zhen Chen | Hui Liu | Zhengyuan Yi | Haoliang Li | Shiqi Wang | Siwei Ma
Findings of the Association for Computational Linguistics: ACL 2026
Retrieval-Augmented Generation (RAG) combines the language understanding and reasoning capabilities of large language models (LLMs) with external retrieval to produce domain-grounded responses. Effectively adapting RAG systems to domain-specific settings requires specialized, context-rich training data beyond general-purpose question-answering datasets. Here, we propose RAGen, a scalable and modular data-centric framework for generating domain-grounded question–answer–context (QAC) triples tailored to diverse RAG adaptation strategies. These QAC triples serve as training signals for multiple RAG adaptation approaches; in this work, we demonstrate their use for contrastive fine-tuning of embedding models and supervised fine-tuning of LLMs under retrieved contexts. RAGen generates QAC triples by identifying key concepts within documents, producing diverse questions guided by Bloom’s Taxonomy–inspired principles, and pairing them with precise answers extracted from relevant contexts. Its modular pipeline incorporates semantic chunking, hierarchical concept extraction, multi-chunk retrieval, and curated distractor contexts to encourage robust reasoning. Designed for scalability, RAGen efficiently handles large and evolving document corpora without redundant processing, making it particularly suitable for dynamic domains like enterprise knowledge bases.
2025
Beyond Logits: Aligning Feature Dynamics for Effective Knowledge Distillation
Guoqiang Gong | Jiaxing Wang | Jin Xu | Deping Xiang | Zicheng Zhang | Leqi Shen | Yifeng Zhang | JunhuaShu JunhuaShu | ZhaolongXing ZhaolongXing | Zhen Chen | Pengzhang Liu | Ke Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Guoqiang Gong | Jiaxing Wang | Jin Xu | Deping Xiang | Zicheng Zhang | Leqi Shen | Yifeng Zhang | JunhuaShu JunhuaShu | ZhaolongXing ZhaolongXing | Zhen Chen | Pengzhang Liu | Ke Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Knowledge distillation (KD) compresses large language models (LLMs), known as teacher models, into lightweight versions called student models, enabling efficient inference and downstream applications. However, prevailing approaches accomplish this by predominantly focusing on matching the final output distributions of student/teacher models. Drawing on the perspective that transformers can be viewed as discretizing ordinary differential equation (ODEs) on integer time steps (corresponding to layer indices), where intermediate features evolve across layers, we argue that effective KD requires aligning the entire feature dynamics between teacher and student models, which we call feature dynamics distillation (FDD). This alignment involves matching both the feature trajectory and its first-order derivative, rather than just the final states. Our approach extends the original KD objective with two additional loss terms: layer-wise feature KD, which matches discretized feature trajectory, and layer feature delta KD, which matches first-order changes in features across adjacent layers. Extensive experiments on various tasks validate the effectiveness of our distillation method.
2022
Improving Continual Relation Extraction through Prototypical Contrastive Learning
Chengwei Hu | Deqing Yang | Haoliang Jin | Zhen Chen | Yanghua Xiao
Proceedings of the 29th International Conference on Computational Linguistics
Chengwei Hu | Deqing Yang | Haoliang Jin | Zhen Chen | Yanghua Xiao
Proceedings of the 29th International Conference on Computational Linguistics
Continual relation extraction (CRE) aims to extract relations towards the continuous and iterative arrival of new data, of which the major challenge is the catastrophic forgetting of old tasks. In order to alleviate this critical problem for enhanced CRE performance, we propose a novel Continual Relation Extraction framework with Contrastive Learning, namely CRECL, which is built with a classification network and a prototypical contrastive network to achieve the incremental-class learning of CRE. Specifically, in the contrastive network a given instance is contrasted with the prototype of each candidate relations stored in the memory module. Such contrastive learning scheme ensures the data distributions of all tasks more distinguishable, so as to alleviate the catastrophic forgetting further. Our experiment results not only demonstrate our CRECL’s advantage over the state-of-the-art baselines on two public datasets, but also verify the effectiveness of CRECL’s contrastive learning on improving performance.
Search
Fix author
Co-authors
- Jie Bao 1
- Jiajia Chen 1
- Xiaolu Chen 1
- Guoqiang Gong 1
- Chengwei Hu 1
- Haoliang Jin 1
- JunhuaShu JunhuaShu 1
- Haoliang Li 1
- Yong Liao 1
- Hui Liu 1
- Pengzhang Liu 1
- Siwei Ma 1
- Leqi Shen 1
- Chris Xing Tian 1
- Shiqi Wang 1
- Jiaxing Wang 1
- Deping Xiang 1
- Yanghua Xiao 1
- Weihao Xie 1
- Jin Xu 1
- Deqing Yang 1
- Zhengyuan Yi 1
- Zicheng Zhang 1
- Yifeng Zhang 1
- Ke Zhang 1
- ZhaolongXing ZhaolongXing 1
- Heyang Zhou 1