IndoNLP 2025 Shared Task: Romanized Sinhala to Sinhala Reverse Transliteration Using BERT

Sandun Sameera Perera, Lahiru Prabhath Jayakodi, Deshan Koshala Sumanathilaka, Isuri Anuradha


Abstract
The Romanized text has become popular with the growth of digital communication platforms, largely due to the familiarity with English keyboards. In Sri Lanka, Romanized Sinhala, commonly referred to as “Singlish” is widely used in digital communications. This paper introduces a novel context-aware back-transliteration system designed to address the ad-hoc typing patterns and lexical ambiguity inherent in Singlish. The proposed system com bines dictionary-based mapping for Singlish words, a rule-based transliteration for out of-vocabulary words and a BERT-based language model for addressing lexical ambiguities. Evaluation results demonstrate the robustness of the proposed approach, achieving high BLEU scores along with low Word Error Rate (WER) and Character Error Rate (CER) across test datasets. This study provides an effective solution for Romanized Sinhala back-transliteration and establishes the foundation for improving NLP tools for similar low-resourced languages.
Anthology ID:
2025.indonlp-1.16
Volume:
Proceedings of the First Workshop on Natural Language Processing for Indo-Aryan and Dravidian Languages
Month:
January
Year:
2025
Address:
Abu Dhabi
Editors:
Ruvan Weerasinghe, Isuri Anuradha, Deshan Sumanathilaka
Venues:
IndoNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
135–140
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.indonlp-1.16/
DOI:
Bibkey:
Cite (ACL):
Sandun Sameera Perera, Lahiru Prabhath Jayakodi, Deshan Koshala Sumanathilaka, and Isuri Anuradha. 2025. IndoNLP 2025 Shared Task: Romanized Sinhala to Sinhala Reverse Transliteration Using BERT. In Proceedings of the First Workshop on Natural Language Processing for Indo-Aryan and Dravidian Languages, pages 135–140, Abu Dhabi. Association for Computational Linguistics.
Cite (Informal):
IndoNLP 2025 Shared Task: Romanized Sinhala to Sinhala Reverse Transliteration Using BERT (Perera et al., IndoNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.indonlp-1.16.pdf