Measuring Attribution in Natural Language Generation Models

Hannah Rashkin, Vitaly Nikolaev, Matthew Lamm, Lora Aroyo, Michael Collins, Dipanjan Das, Slav Petrov, Gaurav Singh Tomar, Iulia Turc, David Reitter


Abstract
Large neural models have brought a new challenge to natural language generation (NLG): It has become imperative to ensure the safety and reliability of the output of models that generate freely. To this end, we present an evaluation framework, Attributable to Identified Sources (AIS), stipulating that NLG output pertaining to the external world is to be verified against an independent, provided source. We define AIS and a two-stage annotation pipeline for allowing annotators to evaluate model output according to annotation guidelines. We successfully validate this approach on generation datasets spanning three tasks (two conversational QA datasets, a summarization dataset, and a table-to-text dataset). We provide full annotation guidelines in the appendices and publicly release the annotated data at https://github.com/google-research-datasets/AIS.
Anthology ID:
2023.cl-4.2
Volume:
Computational Linguistics, Volume 49, Issue 4 - December 2023
Month:
December
Year:
2023
Address:
Cambridge, MA
Venue:
CL
SIG:
Publisher:
MIT Press
Note:
Pages:
777–840
Language:
URL:
https://aclanthology.org/2023.cl-4.2
DOI:
10.1162/coli_a_00486
Bibkey:
Cite (ACL):
Hannah Rashkin, Vitaly Nikolaev, Matthew Lamm, Lora Aroyo, Michael Collins, Dipanjan Das, Slav Petrov, Gaurav Singh Tomar, Iulia Turc, and David Reitter. 2023. Measuring Attribution in Natural Language Generation Models. Computational Linguistics, 49(4):777–840.
Cite (Informal):
Measuring Attribution in Natural Language Generation Models (Rashkin et al., CL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2023.cl-4.2.pdf