AURA-QG: Automated Unsupervised Replicable Assessment for Question Generation

Rajshekar K, Harshad Khadilkar, Pushpak Bhattacharyya


Abstract
Question Generation (QG) is central to information retrieval, education, and knowledge assessment, yet its progress is bottlenecked by unreliable and non-scalable evaluation practices. Traditional metrics fall short in structured settings like document-grounded QG, and human evaluation, while insightful, remains expensive, inconsistent, and difficult to replicate at scale. We introduce AURA-QG: an Automated, Unsupervised, Replicable Assessment pipeline that scores question sets using only the source document. It captures four orthogonal dimensions i.e., answerability, non-redundancy, coverage, and structural entropy, without needing reference questions or relative baselines. Our method is modular, efficient, and agnostic to the question generation strategy. Through extensive experiments across four domains i.e., car manuals, economic surveys, health brochures, and fiction, we demonstrate its robustness across input granularities and prompting paradigms. Chain-of-Thought prompting, which first extracts answer spans and then generates targeted questions, consistently yields higher answerability and coverage, validating the pipeline’s fidelity. The metrics also exhibit strong agreement with human judgments, reinforcing their reliability for practical adoption. The complete implementation of our evaluation pipeline is publicly available.
Anthology ID:
2025.ijcnlp-long.159
Volume:
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Month:
December
Year:
2025
Address:
Mumbai, India
Editors:
Kentaro Inui, Sakriani Sakti, Haofen Wang, Derek F. Wong, Pushpak Bhattacharyya, Biplab Banerjee, Asif Ekbal, Tanmoy Chakraborty, Dhirendra Pratap Singh
Venues:
IJCNLP | AACL
SIG:
Publisher:
The Asian Federation of Natural Language Processing and The Association for Computational Linguistics
Note:
Pages:
2979–2992
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-long.159/
DOI:
Bibkey:
Cite (ACL):
Rajshekar K, Harshad Khadilkar, and Pushpak Bhattacharyya. 2025. AURA-QG: Automated Unsupervised Replicable Assessment for Question Generation. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 2979–2992, Mumbai, India. The Asian Federation of Natural Language Processing and The Association for Computational Linguistics.
Cite (Informal):
AURA-QG: Automated Unsupervised Replicable Assessment for Question Generation (K et al., IJCNLP-AACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-long.159.pdf