A Structured Variational Autoencoder for Contextual Morphological Inflection

Lawrence Wolf-Sonkin, Jason Naradowsky, Sabrina J. Mielke, Ryan Cotterell


Abstract
Statistical morphological inflectors are typically trained on fully supervised, type-level data. One remaining open research question is the following: How can we effectively exploit raw, token-level data to improve their performance? To this end, we introduce a novel generative latent-variable model for the semi-supervised learning of inflection generation. To enable posterior inference over the latent variables, we derive an efficient variational inference procedure based on the wake-sleep algorithm. We experiment on 23 languages, using the Universal Dependencies corpora in a simulated low-resource setting, and find improvements of over 10% absolute accuracy in some cases.
Anthology ID:
P18-1245
Original:
P18-1245v1
Version 2:
P18-1245v2
Volume:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Iryna Gurevych, Yusuke Miyao
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2631–2641
Language:
URL:
https://aclanthology.org/P18-1245
DOI:
10.18653/v1/P18-1245
Bibkey:
Cite (ACL):
Lawrence Wolf-Sonkin, Jason Naradowsky, Sabrina J. Mielke, and Ryan Cotterell. 2018. A Structured Variational Autoencoder for Contextual Morphological Inflection. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2631–2641, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
A Structured Variational Autoencoder for Contextual Morphological Inflection (Wolf-Sonkin et al., ACL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/ml4al-ingestion/P18-1245.pdf
Note:
 P18-1245.Notes.pdf
Poster:
 P18-1245.Poster.pdf
Code
 additional community code