Machine Translation Of Marathi Dialects: A Case Study Of Kadodi

Raj Dabre; Mary Dabre; Teresa Pereira

doi:10.18653/v1/2024.wat-1.3

Machine Translation Of Marathi Dialects: A Case Study Of Kadodi

Abstract

While Marathi is considered as a low- to middle-resource language, its 42 dialects have mostly been ignored, mainly because these dialects are mostly spoken and rarely written, making them extremely low-resource. In this paper we explore the machine translation (MT) of Kadodi, also known as Samvedi, which is a dialect of Marathi. We first discuss the Kadodi dialect, highlighting the differences from the standard dialect, followed by presenting a manually curated dataset called Suman consisting of a trilingual Kadodi-Marathi-English dictionary of 949 entries and 942 simple sentence triples and idioms created by native Kadodi speakers. We then evaluate 3 existing large language models (LLMs) supporting Marathi, namely Gemma-2-9b, Sarvam-2b-0.5 and LLaMa-3.1-8b, in few-shot prompting style to determine their efficacy for translation involving Kadodi. We observe that these models exhibit rather lackluster performance in handling Kadodi even for simple sentences, indicating a dire situation.

Anthology ID:: 2024.wat-1.3
Volume:: Proceedings of the Eleventh Workshop on Asian Translation (WAT 2024)
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Toshiaki Nakazawa, Isao Goto
Venue:: WAT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 36–44
Language:
URL:: https://aclanthology.org/2024.wat-1.3
DOI:: 10.18653/v1/2024.wat-1.3
Bibkey:
Cite (ACL):: Raj Dabre, Mary Dabre, and Teresa Pereira. 2024. Machine Translation Of Marathi Dialects: A Case Study Of Kadodi. In Proceedings of the Eleventh Workshop on Asian Translation (WAT 2024), pages 36–44, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Machine Translation Of Marathi Dialects: A Case Study Of Kadodi (Dabre et al., WAT 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/dois-2013-emnlp/2024.wat-1.3.pdf
Supplementary material:: 2024.wat-1.3.SupplementaryMaterial.txt

PDF Search Supplementary material