Effective Architectures for Low Resource Multilingual Named Entity Transliteration

Molly Moran, Constantine Lignos


Abstract
In this paper, we evaluate LSTM, biLSTM, GRU, and Transformer architectures for the task of name transliteration in a many-to-one multilingual paradigm, transliterating from 590 languages to English. We experiment with different encoder-decoder combinations and evaluate them using accuracy, character error rate, and an F-measure based on longest continuous subsequences. We find that using a Transformer for the encoder and decoder performs best, improving accuracy by over 4 points compared to previous work. We explore whether manipulating the source text by adding macrolanguage flag tokens or pre-romanizing source strings can improve performance and find that neither manipulation has a positive effect. Finally, we analyze performance differences between the LSTM and Transformer encoders when using a Transformer decoder and find that the Transformer encoder is better able to handle insertions and substitutions when transliterating.
Anthology ID:
2020.loresmt-1.11
Volume:
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages
Month:
December
Year:
2020
Address:
Suzhou, China
Venue:
LoResMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
79–86
Language:
URL:
https://aclanthology.org/2020.loresmt-1.11
DOI:
Bibkey:
Cite (ACL):
Molly Moran and Constantine Lignos. 2020. Effective Architectures for Low Resource Multilingual Named Entity Transliteration. In Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages, pages 79–86, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Effective Architectures for Low Resource Multilingual Named Entity Transliteration (Moran & Lignos, LoResMT 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2020.loresmt-1.11.pdf