Hanwen Gu

2026

Planning Beyond Text: Graph-based Reasoning for Complex Narrative Generation
Hanwen Gu | Chao Guo | Junle Wang | Wenda Xie | Yisheng Lv
Findings of the Association for Computational Linguistics: ACL 2026

While LLMs demonstrate remarkable fluency in narrative generation, existing methods struggle to maintain global narrative coherence, contextual logical consistency, and smooth character development, often producing monotonous scripts with structural fractures. To this end, we introduce PLOTTER, a framework that performs narrative planning on structural graph representations instead of direct sequential text representations in existing work. Specifically, PLOTTER executes the Evaluate-Plan-Revise cycle on the event graph and character graph. By diagnosing and repairing issues of the graph topology under rigorous logical constraints, the model optimizes the causality and narrative skeleton before complete context generation. Experiments demonstrate that PLOTTER significantly outperforms representative baselines across diverse narrative scenarios. These findings verify that manipulating narrative planning on structural graph representations—rather than direct text representations—is crucial to enhance the long-context reasoning of LLMs in complex narrative generation.

2025

pdf bib abs

UMAD: Enhancing LLM Debiasing via Multi-Agent Debate and Token-Level Bias Interpretation
Hanwen Gu | JieMa JieMa | Ying Qin | Ling Hu
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)

"Textual data often contain biases that compromise fairness in AI systems, particularly in sensitive areas such as gender, race, and politics. While large language models (LLMs) have shown success across various tasks, they still face limitations due to inherent biases within the model sand restrictive safety policies that hinder direct bias mitigation. To overcome these challenges,we propose UMAD (Unsupervised Multi-Agent Debate), a novel framework that leverages aMulti-Agent Debate mechanism alongside Best-Worst Scaling (BWS) to foster more effective discussions among LLMs, facilitating the identification of biases. By combining this with gradient-based interpretation techniques, UMAD extracts token-level bias insights, which are then integrated into models using in-context learning. This enhances the debiasing performance, as shown by our experiments across three bias categories—gender, religion, and politics—using five different LLMs. Our approach demonstrates significant improvements in metrics, with large models matching or even surpassing GPT-4 in Style Accuracy (STA). We release our code at:https://github.com/Couen/UMAD.git."

Co-authors

Junle Wang 1

Wenda Xie 1

Venues

CCL1
Findings1

Fix author