Abstract
With the continuous upgrading of the summarization systems driven by deep neural networks, researchers have higher requirements on the quality of the generated summaries, which should be not only fluent and informative but also factually correct. As a result, the field of factual evaluation has developed rapidly recently. Despite its initial progress in evaluating generated summaries, the meta-evaluation methodologies of factuality metrics are limited in their opacity, leading to the insufficient understanding of factuality metrics’ relative advantages and their applicability. In this paper, we present an adversarial meta-evaluation methodology that allows us to (i) diagnose the fine-grained strengths and weaknesses of 6 existing top-performing metrics over 24 diagnostic test datasets, (ii) search for directions for further improvement by data augmentation. Our observations from this work motivate us to propose several calls for future research. We make all codes, diagnostic test datasets, trained factuality models available: https://github.com/zide05/AdvFact.- Anthology ID:
- 2021.findings-emnlp.179
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2021
- Month:
- November
- Year:
- 2021
- Address:
- Punta Cana, Dominican Republic
- Editors:
- Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
- Venue:
- Findings
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2082–2095
- Language:
- URL:
- https://aclanthology.org/2021.findings-emnlp.179
- DOI:
- 10.18653/v1/2021.findings-emnlp.179
- Cite (ACL):
- Yiran Chen, Pengfei Liu, and Xipeng Qiu. 2021. Are Factuality Checkers Reliable? Adversarial Meta-evaluation of Factuality in Summarization. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2082–2095, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Are Factuality Checkers Reliable? Adversarial Meta-evaluation of Factuality in Summarization (Chen et al., Findings 2021)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2021.findings-emnlp.179.pdf
- Code
- zide05/advfact
- Data
- CNN/Daily Mail, MultiNLI