A Systematic Review on Machine Translation and Transliteration Techniques for Code-Mixed Indo-Aryan Languages

H. Rukshan Dias, Deshan Sumanathilaka


Abstract
In multilingual societies, it is common to observe the blending of multiple languages in communication, a phenomenon known as Code-mixing. Globalization and the increasing influence of social media have further amplified multilingualism, resulting in a wider use of code-mixing. This systematic review analyzes existing translation and transliteration techniques for code-mixed Indo-Aryan languages, spanning rule-based and statistical approaches to neural machine translation and transformer-based architectures. It also examines publicly available code-mixed datasets designed for machine translation and transliteration tasks, along with the evaluation metrics commonly introduced and applied in prior studies. Finally, the paper discusses current challenges and limitations, highlighting future research directions for developing more tailored translation pipelines for code-mixed Indo-Aryan languages.
Anthology ID:
2025.wat-1.6
Volume:
Proceedings of the Twelfth Workshop on Asian Translation (WAT 2025)
Month:
December
Year:
2025
Address:
Mumbai, India
Editors:
Toshiaki Nakazawa, Isao Goto
Venues:
WAT | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
66–77
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.wat-1.6/
DOI:
Bibkey:
Cite (ACL):
H. Rukshan Dias and Deshan Sumanathilaka. 2025. A Systematic Review on Machine Translation and Transliteration Techniques for Code-Mixed Indo-Aryan Languages. In Proceedings of the Twelfth Workshop on Asian Translation (WAT 2025), pages 66–77, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):
A Systematic Review on Machine Translation and Transliteration Techniques for Code-Mixed Indo-Aryan Languages (Dias & Sumanathilaka, WAT 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.wat-1.6.pdf