FQuAD2.0: French Question Answering and Learning When You Don’t Know

Quentin Heinrich, Gautier Viaud, Wacim Belblidia


Abstract
Question Answering, including Reading Comprehension, is one of the NLP research areas that has seen significant scientific breakthroughs over the past few years, thanks to the concomitant advances in Language Modeling. Most of these breakthroughs, however, are centered on the English language. In 2020, as a first strong initiative to bridge the gap to the French language, Illuin Technology introduced FQuAD1.1, a French Native Reading Comprehension dataset composed of 60,000+ questions and answers samples extracted from Wikipedia articles. Nonetheless, Question Answering models trained on this dataset have a major drawback: they are not able to predict when a given question has no answer in the paragraph of interest, therefore making unreliable predictions in various industrial use-cases. We introduce FQuAD2.0, which extends FQuAD with 17,000+ unanswerable questions, annotated adversarially, in order to be similar to answerable ones. This new dataset, comprising a total of almost 80,000 questions, makes it possible to train French Question Answering models with the ability of distinguishing unanswerable questions from answerable ones. We benchmark several models with this dataset: our best model, a fine-tuned CamemBERT-large, achieves a F1 score of 82.3% on this classification task, and a F1 score of 83% on the Reading Comprehension task.
Anthology ID:
2022.lrec-1.237
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
2205–2214
Language:
URL:
https://aclanthology.org/2022.lrec-1.237
DOI:
Bibkey:
Cite (ACL):
Quentin Heinrich, Gautier Viaud, and Wacim Belblidia. 2022. FQuAD2.0: French Question Answering and Learning When You Don’t Know. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 2205–2214, Marseille, France. European Language Resources Association.
Cite (Informal):
FQuAD2.0: French Question Answering and Learning When You Don’t Know (Heinrich et al., LREC 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2022.lrec-1.237.pdf
Data
BoolQCoQADROPFQuADNatural QuestionsSQuAD