Automatic Evaluation vs. User Preference in Neural Textual QuestionAnswering over COVID-19 Scientific Literature

Arantxa Otegi; Jon Ander Campos; Gorka Azkune; Aitor Soroa; Eneko Agirre

doi:10.18653/v1/2020.nlpcovid19-2.15

Automatic Evaluation vs. User Preference in Neural Textual QuestionAnswering over COVID-19 Scientific Literature

Arantxa Otegi, Jon Ander Campos, Gorka Azkune, Aitor Soroa, Eneko Agirre

Abstract

We present a Question Answering (QA) system that won one of the tasks of the Kaggle CORD-19 Challenge, according to the qualitative evaluation of experts. The system is a combination of an Information Retrieval module and a reading comprehension module that finds the answers in the retrieved passages. In this paper we present a quantitative and qualitative analysis of the system. The quantitative evaluation using manually annotated datasets contradicted some of our design choices, e.g. the fact that using QuAC for fine-tuning provided better answers over just using SQuAD. We analyzed this mismatch with an additional A/B test which showed that the system using QuAC was indeed preferred by users, confirming our intuition. Our analysis puts in question the suitability of automatic metrics and its correlation to user preferences. We also show that automatic metrics are highly dependent on the characteristics of the gold standard, such as the average length of the answers.

Anthology ID:: 2020.nlpcovid19-2.15
Volume:: Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020
Month:: December
Year:: 2020
Address:: Online
Venue:: NLP-COVID19
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:
Language:
URL:: https://aclanthology.org/2020.nlpcovid19-2.15
DOI:: 10.18653/v1/2020.nlpcovid19-2.15
Bibkey:
Cite (ACL):: Arantxa Otegi, Jon Ander Campos, Gorka Azkune, Aitor Soroa, and Eneko Agirre. 2020. Automatic Evaluation vs. User Preference in Neural Textual QuestionAnswering over COVID-19 Scientific Literature. In Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020, Online. Association for Computational Linguistics.
Cite (Informal):: Automatic Evaluation vs. User Preference in Neural Textual QuestionAnswering over COVID-19 Scientific Literature (Otegi et al., NLP-COVID19 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-script-update/2020.nlpcovid19-2.15.pdf
Video:: https://slideslive.com/38939858
Data: CORD-19, QuAC

PDF Search Video