Wiktionary Normalization of Translations and Morphological Information

Winston Wu, David Yarowsky


Abstract
We extend the Yawipa Wiktionary Parser (Wu and Yarowsky, 2020) to extract and normalize translations from etymology glosses, and morphological form-of relations, resulting in 300K unique translations and over 4 million instances of 168 annotated morphological relations. We propose a method to identify typos in translation annotations. Using the extracted morphological data, we develop multilingual neural models for predicting three types of word formation—clipping, contraction, and eye dialect—and improve upon a standard attention baseline by using copy attention.
Anthology ID:
2020.coling-main.413
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
4683–4692
Language:
URL:
https://aclanthology.org/2020.coling-main.413
DOI:
10.18653/v1/2020.coling-main.413
Bibkey:
Cite (ACL):
Winston Wu and David Yarowsky. 2020. Wiktionary Normalization of Translations and Morphological Information. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4683–4692, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Wiktionary Normalization of Translations and Morphological Information (Wu & Yarowsky, COLING 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2020.coling-main.413.pdf