Classification of Code-Mixed Text Using Capsule Networks

Shanaka Chathuranga; Surangika Ranathunga

Classification of Code-Mixed Text Using Capsule Networks

Shanaka Chathuranga, Surangika Ranathunga

Abstract

A major challenge in analysing social me-dia data belonging to languages that use non-English script is its code-mixed nature. Recentresearch has presented state-of-the-art contex-tual embedding models (both monolingual s.a.BERT and multilingual s.a.XLM-R) as apromising approach. In this paper, we showthat the performance of such embedding mod-els depends on multiple factors, such as thelevel of code-mixing in the dataset, and thesize of the training dataset. We empiricallyshow that a newly introduced Capsule+biGRUclassifier could outperform a classifier built onthe English-BERT as well as XLM-R just witha training dataset of about 6500 samples forthe Sinhala-English code-mixed data.

Anthology ID:: 2021.ranlp-1.30
Volume:: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:: September
Year:: 2021
Address:: Held Online
Editors:: Ruslan Mitkov, Galia Angelova
Venue:: RANLP
SIG:
Publisher:: INCOMA Ltd.
Note:
Pages:: 256–263
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2021.ranlp-1.30/
DOI:
Bibkey:
Cite (ACL):: Shanaka Chathuranga and Surangika Ranathunga. 2021. Classification of Code-Mixed Text Using Capsule Networks. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 256–263, Held Online. INCOMA Ltd..
Cite (Informal):: Classification of Code-Mixed Text Using Capsule Networks (Chathuranga & Ranathunga, RANLP 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2021.ranlp-1.30.pdf

PDF Cite Search Fix data