ArgGenBench: Benchmarking the Complex Controlled Argument Generation Capability of Large Language Models

Bojun Jin, Jianzhu Bao, Yang Sun, Yice Zhang, Ruifeng Xu


Abstract
Argument generation is a fundamental NLP task that aims to automatically produce persuasive arguments.Effective human argumentation is inherently complex and multifaceted, integrating argumentative strategies, appropriate styles, and adaptation to target audiences, etc.However, existing studies focus on limited control signals such as topic, stance, or key aspects, failing to capture this complexity.As LLMs advance, the lack of benchmarks evaluating multifaceted argumentative control becomes a critical bottleneck.To address this, we introduce ArgGenBench, a novel benchmark containing complex instructions that integrate multi-dimensional control, including topic, stance, length, style, strategy, audience, and key points.Extensive evaluation across 15 LLMs reveals significant limitations: even the best-performing model achieves only 42.7% win rate against human-verified references.These results highlight the challenge of controlled argument generation and establish ArgGenBench as a rigorous testbed for developing more capable systems.
Anthology ID:
2026.acl-long.1414
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
30630–30662
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1414/
DOI:
Bibkey:
Cite (ACL):
Bojun Jin, Jianzhu Bao, Yang Sun, Yice Zhang, and Ruifeng Xu. 2026. ArgGenBench: Benchmarking the Complex Controlled Argument Generation Capability of Large Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 30630–30662, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
ArgGenBench: Benchmarking the Complex Controlled Argument Generation Capability of Large Language Models (Jin et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1414.pdf
Checklist:
 2026.acl-long.1414.checklist.pdf