Design Challenges in Named Entity Transliteration

Yuval Merhav, Stephen Ash


Abstract
We analyze some of the fundamental design challenges that impact the development of a multilingual state-of-the-art named entity transliteration system, including curating bilingual named entity datasets and evaluation of multiple transliteration methods. We empirically evaluate the transliteration task using the traditional weighted finite state transducer (WFST) approach against two neural approaches: the encoder-decoder recurrent neural network method and the recent, non-sequential Transformer method. In order to improve availability of bilingual named entity transliteration datasets, we release personal name bilingual dictionaries mined from Wikidata for English to Russian, Hebrew, Arabic, and Japanese Katakana. Our code and dictionaries are publicly available.
Anthology ID:
C18-1053
Volume:
Proceedings of the 27th International Conference on Computational Linguistics
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
630–640
Language:
URL:
https://aclanthology.org/C18-1053
DOI:
Bibkey:
Cite (ACL):
Yuval Merhav and Stephen Ash. 2018. Design Challenges in Named Entity Transliteration. In Proceedings of the 27th International Conference on Computational Linguistics, pages 630–640, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Design Challenges in Named Entity Transliteration (Merhav & Ash, COLING 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/C18-1053.pdf
Code
 steveash/NETransliteration-COLING2018