Unsupervised Disambiguation of Syncretism in Inflected Lexicons
Ryan Cotterell, Christo Kirov, Sabrina J. Mielke, Jason Eisner
Abstract
Lexical ambiguity makes it difficult to compute useful statistics of a corpus. A given word form might represent any of several morphological feature bundles. One can, however, use unsupervised learning (as in EM) to fit a model that probabilistically disambiguates word forms. We present such an approach, which employs a neural network to smoothly model a prior distribution over feature bundles (even rare ones). Although this basic model does not consider a token’s context, that very property allows it to operate on a simple list of unigram type counts, partitioning each count among different analyses of that unigram. We discuss evaluation metrics for this novel task and report results on 5 languages.- Anthology ID:
- N18-2087
- Original:
- N18-2087v1
- Version 2:
- N18-2087v2
- Volume:
- Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
- Month:
- June
- Year:
- 2018
- Address:
- New Orleans, Louisiana
- Editors:
- Marilyn Walker, Heng Ji, Amanda Stent
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 548–553
- Language:
- URL:
- https://aclanthology.org/N18-2087
- DOI:
- 10.18653/v1/N18-2087
- Cite (ACL):
- Ryan Cotterell, Christo Kirov, Sabrina J. Mielke, and Jason Eisner. 2018. Unsupervised Disambiguation of Syncretism in Inflected Lexicons. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 548–553, New Orleans, Louisiana. Association for Computational Linguistics.
- Cite (Informal):
- Unsupervised Disambiguation of Syncretism in Inflected Lexicons (Cotterell et al., NAACL 2018)
- PDF:
- https://preview.aclanthology.org/ml4al-ingestion/N18-2087.pdf
- Data
- Universal Dependencies