Joint Approach to Deromanization of Code-mixed Texts

Rashed Rubby Riyadh; Grzegorz Kondrak

doi:10.18653/v1/W19-1403

Joint Approach to Deromanization of Code-mixed Texts

Abstract

The conversion of romanized texts back to the native scripts is a challenging task because of the inconsistent romanization conventions and non-standard language use. This problem is compounded by code-mixing, i.e., using words from more than one language within the same discourse. In this paper, we propose a novel approach for handling these two problems together in a single system. Our approach combines three components: language identification, back-transliteration, and sequence prediction. The results of our experiments on Bengali and Hindi datasets establish the state of the art for the task of deromanization of code-mixed texts.

Anthology ID:: W19-1403
Volume:: Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
Month:: June
Year:: 2019
Address:: Ann Arbor, Michigan
Editors:: Marcos Zampieri, Preslav Nakov, Shervin Malmasi, Nikola Ljubešić, Jörg Tiedemann, Ahmed Ali
Venue:: VarDial
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 26–34
Language:
URL:: https://aclanthology.org/W19-1403
DOI:: 10.18653/v1/W19-1403
Bibkey:
Cite (ACL):: Rashed Rubby Riyadh and Grzegorz Kondrak. 2019. Joint Approach to Deromanization of Code-mixed Texts. In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 26–34, Ann Arbor, Michigan. Association for Computational Linguistics.
Cite (Informal):: Joint Approach to Deromanization of Code-mixed Texts (Riyadh & Kondrak, VarDial 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/autopr/W19-1403.pdf

PDF Search Fix metadata