AIA-BDE: A Corpus of FAQs in Portuguese and their Variations

Hugo Gonçalo Oliveira, João Ferreira, José Santos, Pedro Fialho, Ricardo Rodrigues, Luisa Coheur, Ana Alves


Abstract
We present AIA-BDE, a corpus of 380 domain-oriented FAQs in Portuguese and their variations, i.e., paraphrases or entailed questions, created manually, by humans, or automatically, with Google Translate. Its aims to be used as a benchmark for FAQ retrieval and automatic question-answering, but may be useful in other contexts, such as the development of task-oriented dialogue systems, or models for natural language inference in an interrogative context. We also report on two experiments. Matching variations with their original questions was not trivial with a set of unsupervised baselines, especially for manually created variations. Besides high performances obtained with ELMo and BERT embeddings, an Information Retrieval system was surprisingly competitive when considering only the first hit. In the second experiment, text classifiers were trained with the original questions, and tested when assigning each variation to one of three possible sources, or assigning them as out-of-domain. Here, the difference between manual and automatic variations was not so significant.
Anthology ID:
2020.lrec-1.669
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5442–5449
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.669
DOI:
Bibkey:
Cite (ACL):
Hugo Gonçalo Oliveira, João Ferreira, José Santos, Pedro Fialho, Ricardo Rodrigues, Luisa Coheur, and Ana Alves. 2020. AIA-BDE: A Corpus of FAQs in Portuguese and their Variations. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5442–5449, Marseille, France. European Language Resources Association.
Cite (Informal):
AIA-BDE: A Corpus of FAQs in Portuguese and their Variations (Gonçalo Oliveira et al., LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2020.lrec-1.669.pdf