Code-Switched Language Models Using Neural Based Synthetic Data from Parallel Sentences
Genta Indra Winata, Andrea Madotto, Chien-Sheng Wu, Pascale Fung
Abstract
Training code-switched language models is difficult due to lack of data and complexity in the grammatical structure. Linguistic constraint theories have been used for decades to generate artificial code-switching sentences to cope with this issue. However, this require external word alignments or constituency parsers that create erroneous results on distant languages. We propose a sequence-to-sequence model using a copy mechanism to generate code-switching data by leveraging parallel monolingual translations from a limited source of code-switching data. The model learns how to combine words from parallel sentences and identifies when to switch one language to the other. Moreover, it captures code-switching constraints by attending and aligning the words in inputs, without requiring any external knowledge. Based on experimental results, the language model trained with the generated sentences achieves state-of-the-art performance and improves end-to-end automatic speech recognition.- Anthology ID:
- K19-1026
- Volume:
- Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)
- Month:
- November
- Year:
- 2019
- Address:
- Hong Kong, China
- Editors:
- Mohit Bansal, Aline Villavicencio
- Venue:
- CoNLL
- SIG:
- SIGNLL
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 271–280
- Language:
- URL:
- https://aclanthology.org/K19-1026
- DOI:
- 10.18653/v1/K19-1026
- Cite (ACL):
- Genta Indra Winata, Andrea Madotto, Chien-Sheng Wu, and Pascale Fung. 2019. Code-Switched Language Models Using Neural Based Synthetic Data from Parallel Sentences. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 271–280, Hong Kong, China. Association for Computational Linguistics.
- Cite (Informal):
- Code-Switched Language Models Using Neural Based Synthetic Data from Parallel Sentences (Winata et al., CoNLL 2019)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/K19-1026.pdf