Abstract
Cross-lingual classification poses a significant challenge in Natural Language Processing (NLP), especially when dealing with languages with scarce training data. This paper delves into the adaptation of ensemble learning to address this challenge, specifically for disaster-related social media texts. Initially, we employ Machine Translation to generate a parallel corpus in the target language to mitigate the issue of data scarcity and foster a robust training environment. Following this, we implement the bagging ensemble technique, integrating multiple classifiers into a cohesive model that demonstrates enhanced performance over individual classifiers. Our experimental results reveal significant improvements in adapting models for Arabic, utilising only English training data and markedly outperforming models intended for linguistically similar languages to English, with our ensemble model achieving an accuracy and F1 score of 0.78 when tested on original Arabic data. This research makes a substantial contribution to the field of cross-lingual classification, establishing a new benchmark for enhancing the effectiveness of language transfer in linguistically challenging scenarios.- Anthology ID:
- 2024.loresmt-1.16
- Volume:
- Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024)
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand
- Editors:
- Atul Kr. Ojha, Chao-hong Liu, Ekaterina Vylomova, Flammie Pirinen, Jade Abbott, Jonathan Washington, Nathaniel Oco, Valentin Malykh, Varvara Logacheva, Xiaobing Zhao
- Venues:
- LoResMT | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 159–165
- Language:
- URL:
- https://aclanthology.org/2024.loresmt-1.16
- DOI:
- 10.18653/v1/2024.loresmt-1.16
- Cite (ACL):
- Shareefa Al Amer, Mark Lee, and Phillip Smith. 2024. Adopting Ensemble Learning for Cross-lingual Classification of Crisis-related Text On Social Media. In Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024), pages 159–165, Bangkok, Thailand. Association for Computational Linguistics.
- Cite (Informal):
- Adopting Ensemble Learning for Cross-lingual Classification of Crisis-related Text On Social Media (Al Amer et al., LoResMT-WS 2024)
- PDF:
- https://preview.aclanthology.org/autopr/2024.loresmt-1.16.pdf