Building a Task-oriented Dialog System for Languages with no Training Data: the Case for Basque

Maddalen López de Lacalle, Xabier Saralegi, Iñaki San Vicente


Abstract
This paper presents an approach for developing a task-oriented dialog system for less-resourced languages in scenarios where training data is not available. Both intent classification and slot filling are tackled. We project the existing annotations in rich-resource languages by means of Neural Machine Translation (NMT) and posterior word alignments. We then compare training on the projected monolingual data with direct model transfer alternatives. Intent Classifiers and slot filling sequence taggers are implemented using a BiLSTM architecture or by fine-tuning BERT transformer models. Models learnt exclusively from Basque projected data provide better accuracies for slot filling. Combining Basque projected train data with rich-resource languages data outperforms consistently models trained solely on projected data for intent classification. At any rate, we achieve competitive performance in both tasks, with accuracies of 81% for intent classification and 77% for slot filling.
Anthology ID:
2020.lrec-1.340
Volume:
Proceedings of the 12th Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
2796–2802
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.340
DOI:
Bibkey:
Cite (ACL):
Maddalen López de Lacalle, Xabier Saralegi, and Iñaki San Vicente. 2020. Building a Task-oriented Dialog System for Languages with no Training Data: the Case for Basque. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 2796–2802, Marseille, France. European Language Resources Association.
Cite (Informal):
Building a Task-oriented Dialog System for Languages with no Training Data: the Case for Basque (López de Lacalle et al., LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2020.lrec-1.340.pdf