Abstract
We extend the Yawipa Wiktionary Parser (Wu and Yarowsky, 2020) to extract and normalize translations from etymology glosses, and morphological form-of relations, resulting in 300K unique translations and over 4 million instances of 168 annotated morphological relations. We propose a method to identify typos in translation annotations. Using the extracted morphological data, we develop multilingual neural models for predicting three types of word formation—clipping, contraction, and eye dialect—and improve upon a standard attention baseline by using copy attention.- Anthology ID:
- 2020.coling-main.413
- Volume:
- Proceedings of the 28th International Conference on Computational Linguistics
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Editors:
- Donia Scott, Nuria Bel, Chengqing Zong
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 4683–4692
- Language:
- URL:
- https://aclanthology.org/2020.coling-main.413
- DOI:
- 10.18653/v1/2020.coling-main.413
- Cite (ACL):
- Winston Wu and David Yarowsky. 2020. Wiktionary Normalization of Translations and Morphological Information. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4683–4692, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Cite (Informal):
- Wiktionary Normalization of Translations and Morphological Information (Wu & Yarowsky, COLING 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2020.coling-main.413.pdf