Fudan Zheng
2026
Beyond Overlap Metrics: Rewarding Reasoning and Preferences for Faithful Multi-Role Dialogue Summarization
Xiaoyong Mei | Tingting Zuo | Da Chen | Guangyu Hu | Xiangyu Wen | Chao Duan | Mingyan Zhang | Fudan Zheng
Findings of the Association for Computational Linguistics: ACL 2026
Xiaoyong Mei | Tingting Zuo | Da Chen | Guangyu Hu | Xiangyu Wen | Chao Duan | Mingyan Zhang | Fudan Zheng
Findings of the Association for Computational Linguistics: ACL 2026
Multi-role dialogue summarization requires modeling complex interactions among multiple speakers while preserving role-specific information and factual consistency. However, most existing methods optimize for automatic metrics such as ROUGE and BERTScore, which favor surface-level imitation of references rather than genuine gains in faithfulness or alignment with human preferences. We propose a novel framework that couples explicit cognitive-style reasoning with reward-based optimization for multi-role dialogue summarization. Our method first distills structured reasoning traces (e.g., step-by-step inferences and intermediate reflections) from a large teacher model and uses them as auxiliary supervision to initialize a reasoning-aware summarizer via staged supervised fine-tuning. It then applies GRPO with a dual-principle reward that blends metric-based signals with human-aligned criteria targeting key information coverage, implicit inference, factual faithfulness, and conciseness. Experiments on multilingual multi-role dialogue benchmarks show that our method matches strong baselines on ROUGE and BERTScore. Specifically, results on CSDS confirm the framework’s stability in semantic consistency, while in-depth analysis on SAMSum demonstrates clear gains in factual faithfulness and model-based preference alignment. These findings underscore the value of reasoning-aware and preference-aware training for reliable dialogue summarization. Code will be made accessible upon acceptance, checkpoints and datasets are now available at https://huggingface.co/NebulaPixel.
2021
Improving Math Word Problems with Pre-trained Knowledge and Hierarchical Reasoning
Weijiang Yu | Yingpeng Wen | Fudan Zheng | Nong Xiao
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Weijiang Yu | Yingpeng Wen | Fudan Zheng | Nong Xiao
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
The recent algorithms for math word problems (MWP) neglect to use outside knowledge not present in the problems. Most of them only capture the word-level relationship and ignore to build hierarchical reasoning like the human being for mining the contextual structure between words and sentences. In this paper, we propose a Reasoning with Pre-trained Knowledge and Hierarchical Structure (RPKHS) network, which contains a pre-trained knowledge encoder and a hierarchical reasoning encoder. Firstly, our pre-trained knowledge encoder aims at reasoning the MWP by using outside knowledge from the pre-trained transformer-based models. Secondly, the hierarchical reasoning encoder is presented for seamlessly integrating the word-level and sentence-level reasoning to bridge the entity and context domain on MWP. Extensive experiments show that our RPKHS significantly outperforms state-of-the-art approaches on two large-scale commonly-used datasets, and boosts performance from 77.4% to 83.9% on Math23K, from 75.5 to 82.2% on Math23K with 5-fold cross-validation and from 83.7% to 89.8% on MAWPS. More extensive ablations are shown to demonstrate the effectiveness and interpretability of our proposed method.