Abstract
Code-mixing is a frequent communication style among multilingual speakers where they mix words and phrases from two different languages in the same utterance of text or speech. Identifying and filtering code-mixed text is a challenging task due to its co-existence with monolingual and noisy text. Over the years, several code-mixing metrics have been extensively used to identify and validate code-mixed text quality. This paper demonstrates several inherent limitations of code-mixing metrics with examples from the already existing datasets that are popularly used across various experiments.- Anthology ID:
- 2021.calcs-1.2
- Volume:
- Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Venue:
- CALCS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6–14
- Language:
- URL:
- https://aclanthology.org/2021.calcs-1.2
- DOI:
- 10.18653/v1/2021.calcs-1.2
- Cite (ACL):
- Vivek Srivastava and Mayank Singh. 2021. Challenges and Limitations with the Metrics Measuring the Complexity of Code-Mixed Text. In Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching, pages 6–14, Online. Association for Computational Linguistics.
- Cite (Informal):
- Challenges and Limitations with the Metrics Measuring the Complexity of Code-Mixed Text (Srivastava & Singh, CALCS 2021)
- PDF:
- https://preview.aclanthology.org/auto-file-uploads/2021.calcs-1.2.pdf