Evaluation methodologies in Automatic Question Generation 2013-2018

Jacopo Amidei, Paul Piwek, Alistair Willis


Abstract
In the last few years Automatic Question Generation (AQG) has attracted increasing interest. In this paper we survey the evaluation methodologies used in AQG. Based on a sample of 37 papers, our research shows that the systems’ development has not been accompanied by similar developments in the methodologies used for the systems’ evaluation. Indeed, in the papers we examine here, we find a wide variety of both intrinsic and extrinsic evaluation methodologies. Such diverse evaluation practices make it difficult to reliably compare the quality of different generation systems. Our study suggests that, given the rapidly increasing level of research in the area, a common framework is urgently needed to compare the performance of AQG systems and NLG systems more generally.
Anthology ID:
W18-6537
Volume:
Proceedings of the 11th International Conference on Natural Language Generation
Month:
November
Year:
2018
Address:
Tilburg University, The Netherlands
Editors:
Emiel Krahmer, Albert Gatt, Martijn Goudbeek
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
307–317
Language:
URL:
https://aclanthology.org/W18-6537
DOI:
10.18653/v1/W18-6537
Bibkey:
Cite (ACL):
Jacopo Amidei, Paul Piwek, and Alistair Willis. 2018. Evaluation methodologies in Automatic Question Generation 2013-2018. In Proceedings of the 11th International Conference on Natural Language Generation, pages 307–317, Tilburg University, The Netherlands. Association for Computational Linguistics.
Cite (Informal):
Evaluation methodologies in Automatic Question Generation 2013-2018 (Amidei et al., INLG 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/ml4al-ingestion/W18-6537.pdf