Challenges and Limitations with the Metrics Measuring the Complexity of Code-Mixed Text

Vivek Srivastava; Mayank Singh

doi:10.18653/v1/2021.calcs-1.2

Challenges and Limitations with the Metrics Measuring the Complexity of Code-Mixed Text

Abstract

Code-mixing is a frequent communication style among multilingual speakers where they mix words and phrases from two different languages in the same utterance of text or speech. Identifying and filtering code-mixed text is a challenging task due to its co-existence with monolingual and noisy text. Over the years, several code-mixing metrics have been extensively used to identify and validate code-mixed text quality. This paper demonstrates several inherent limitations of code-mixing metrics with examples from the already existing datasets that are popularly used across various experiments.

Anthology ID:: 2021.calcs-1.2
Volume:: Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching
Month:: June
Year:: 2021
Address:: Online
Venue:: CALCS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6–14
Language:
URL:: https://aclanthology.org/2021.calcs-1.2
DOI:: 10.18653/v1/2021.calcs-1.2
Bibkey:
Cite (ACL):: Vivek Srivastava and Mayank Singh. 2021. Challenges and Limitations with the Metrics Measuring the Complexity of Code-Mixed Text. In Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching, pages 6–14, Online. Association for Computational Linguistics.
Cite (Informal):: Challenges and Limitations with the Metrics Measuring the Complexity of Code-Mixed Text (Srivastava & Singh, CALCS 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/auto-file-uploads/2021.calcs-1.2.pdf

PDF Search