Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning

Genta Indra Winata, Andrea Madotto, Chien-Sheng Wu, Pascale Fung


Abstract
Lack of text data has been the major issue on code-switching language modeling. In this paper, we introduce multi-task learning based language model which shares syntax representation of languages to leverage linguistic information and tackle the low resource data issue. Our model jointly learns both language modeling and Part-of-Speech tagging on code-switched utterances. In this way, the model is able to identify the location of code-switching points and improves the prediction of next word. Our approach outperforms standard LSTM based language model, with an improvement of 9.7% and 7.4% in perplexity on SEAME Phase I and Phase II dataset respectively.
Anthology ID:
W18-3207
Volume:
Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Gustavo Aguilar, Fahad AlGhamdi, Victor Soto, Thamar Solorio, Mona Diab, Julia Hirschberg
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
62–67
Language:
URL:
https://aclanthology.org/W18-3207
DOI:
10.18653/v1/W18-3207
Bibkey:
Cite (ACL):
Genta Indra Winata, Andrea Madotto, Chien-Sheng Wu, and Pascale Fung. 2018. Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning. In Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching, pages 62–67, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning (Winata et al., ACL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/ml4al-ingestion/W18-3207.pdf