Abstract
The main challenge in English-Malay cross-lingual emotion classification is that there are no Malay training emotion corpora. Given that machine translation could fall short in contextually complex tweets, we only limited machine translation to the word level. In this paper, we bridge the language gap between English and Malay through cross-lingual word embeddings constructed using singular value decomposition. We pre-trained our hierarchical attention model using English tweets and fine-tuned it using a set of gold standard Malay tweets. Our model uses significantly less computational resources compared to the language models. Experimental results show that the performance of our model is better than mBERT in zero-shot learning by 2.4% and Malay BERT by 0.8% when a limited number of Malay tweets is available. In exchange for 6 – 7 times less in computational time, our model only lags behind mBERT and XLM-RoBERTa by a margin of 0.9 – 4.3 % in few-shot learning. Also, the word-level attention could be transferred to the Malay tweets accurately using the cross-lingual word embeddings.- Anthology ID:
- 2022.wassa-1.12
- Volume:
- Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Editors:
- Jeremy Barnes, Orphée De Clercq, Valentin Barriere, Shabnam Tafreshi, Sawsan Alqahtani, João Sedoc, Roman Klinger, Alexandra Balahur
- Venue:
- WASSA
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 113–124
- Language:
- URL:
- https://aclanthology.org/2022.wassa-1.12
- DOI:
- 10.18653/v1/2022.wassa-1.12
- Cite (ACL):
- Ying Hao Lim and Jasy Suet Yan Liew. 2022. English-Malay Word Embeddings Alignment for Cross-lingual Emotion Classification with Hierarchical Attention Network. In Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis, pages 113–124, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- English-Malay Word Embeddings Alignment for Cross-lingual Emotion Classification with Hierarchical Attention Network (Lim & Liew, WASSA 2022)
- PDF:
- https://preview.aclanthology.org/emnlp-22-attachments/2022.wassa-1.12.pdf