Character Alignment in Morphologically Complex Translation Sets for Related Languages

Michael Gasser, Binyam Ephrem Seyoum, Nazareth Amlesom Kifle


Abstract
For languages with complex morphology, word-to-word translation is a task with various potential applications, for example, in information retrieval, language instruction, and dictionary creation, as well as in machine translation. In this paper, we confine ourselves to the subtask of character alignment for the particular case of families of related languages with very few resources for most or all members. There are many such families; we focus on the subgroup of Semitic languages spoken in Ethiopia and Eritrea. We begin with an adaptation of the familiar alignment algorithms behind statistical machine translation, modifying them as appropriate for our task. We show how character alignment can reveal morphological, phonological, and orthographic correspondences among related languages.
Anthology ID:
2020.vardial-1.5
Volume:
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Yves Scherrer
Venue:
VarDial
SIG:
Publisher:
International Committee on Computational Linguistics (ICCL)
Note:
Pages:
47–56
Language:
URL:
https://aclanthology.org/2020.vardial-1.5
DOI:
Bibkey:
Cite (ACL):
Michael Gasser, Binyam Ephrem Seyoum, and Nazareth Amlesom Kifle. 2020. Character Alignment in Morphologically Complex Translation Sets for Related Languages. In Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, pages 47–56, Barcelona, Spain (Online). International Committee on Computational Linguistics (ICCL).
Cite (Informal):
Character Alignment in Morphologically Complex Translation Sets for Related Languages (Gasser et al., VarDial 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-3/2020.vardial-1.5.pdf