QRelScore: Better Evaluating Generated Questions with Deeper Understanding of Context-aware Relevance

Xiaoqiang Wang; Bang Liu; Siliang Tang; Lingfei Wu

doi:10.18653/v1/2022.emnlp-main.37

QRelScore: Better Evaluating Generated Questions with Deeper Understanding of Context-aware Relevance

Xiaoqiang Wang, Bang Liu, Siliang Tang, Lingfei Wu

Abstract

Existing metrics for assessing question generation not only require costly human reference but also fail to take into account the input context of generation, rendering the lack of deep understanding of the relevance between the generated questions and input contexts. As a result, they may wrongly penalize a legitimate and reasonable candidate question when it (1) involves complicated reasoning with the context or (2) can be grounded by multiple evidences in the context.In this paper, we propose QRelScore, a context-aware Relevance evaluation metric for Question Generation.Based on off-the-shelf language models such as BERT and GPT2, QRelScore employs both word-level hierarchical matching and sentence-level prompt-based generation to cope with the complicated reasoning and diverse generation from multiple evidences, respectively.Compared with existing metrics, our experiments demonstrate that QRelScore is able to achieve a higher correlation with human judgments while being much more robust to adversarial samples.

Anthology ID:: 2022.emnlp-main.37
Volume:: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 562–581
Language:
URL:: https://aclanthology.org/2022.emnlp-main.37
DOI:: 10.18653/v1/2022.emnlp-main.37
Bibkey:
Cite (ACL):: Xiaoqiang Wang, Bang Liu, Siliang Tang, and Lingfei Wu. 2022. QRelScore: Better Evaluating Generated Questions with Deeper Understanding of Context-aware Relevance. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 562–581, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: QRelScore: Better Evaluating Generated Questions with Deeper Understanding of Context-aware Relevance (Wang et al., EMNLP 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-2/2022.emnlp-main.37.pdf
Software:: 2022.emnlp-main.37.software.zip

PDF Search Software