Cross-lingual Transfer Learning with Data Selection for Large-Scale Spoken Language Understanding

Quynh Do; Judith Gaspers

doi:10.18653/v1/D19-1153

Cross-lingual Transfer Learning with Data Selection for Large-Scale Spoken Language Understanding

Abstract

A typical cross-lingual transfer learning approach boosting model performance on a language is to pre-train the model on all available supervised data from another language. However, in large-scale systems this leads to high training times and computational requirements. In addition, characteristic differences between the source and target languages raise a natural question of whether source data selection can improve the knowledge transfer. In this paper, we address this question and propose a simple but effective language model based source-language data selection method for cross-lingual transfer learning in large-scale spoken language understanding. The experimental results show that with data selection i) source data and hence training speed is reduced significantly and ii) model performance is improved.

Anthology ID:: D19-1153
Volume:: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Month:: November
Year:: 2019
Address:: Hong Kong, China
Editors:: Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
Venues:: EMNLP | IJCNLP
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1455–1460
Language:
URL:: https://aclanthology.org/D19-1153
DOI:: 10.18653/v1/D19-1153
Bibkey:
Cite (ACL):: Quynh Do and Judith Gaspers. 2019. Cross-lingual Transfer Learning with Data Selection for Large-Scale Spoken Language Understanding. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1455–1460, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):: Cross-lingual Transfer Learning with Data Selection for Large-Scale Spoken Language Understanding (Do & Gaspers, EMNLP-IJCNLP 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-bitext-workshop/D19-1153.pdf

PDF Search