Morph-fitting: Fine-Tuning Word Vector Spaces with Simple Language-Specific Rules

Ivan Vulić, Nikola Mrkšić, Roi Reichart, Diarmuid Ó Séaghdha, Steve Young, Anna Korhonen


Abstract
Morphologically rich languages accentuate two properties of distributional vector space models: 1) the difficulty of inducing accurate representations for low-frequency word forms; and 2) insensitivity to distinct lexical relations that have similar distributional signatures. These effects are detrimental for language understanding systems, which may infer that ‘inexpensive’ is a rephrasing for ‘expensive’ or may not associate ‘acquire’ with ‘acquires’. In this work, we propose a novel morph-fitting procedure which moves past the use of curated semantic lexicons for improving distributional vector spaces. Instead, our method injects morphological constraints generated using simple language-specific rules, pulling inflectional forms of the same word close together and pushing derivational antonyms far apart. In intrinsic evaluation over four languages, we show that our approach: 1) improves low-frequency word estimates; and 2) boosts the semantic quality of the entire word vector collection. Finally, we show that morph-fitted vectors yield large gains in the downstream task of dialogue state tracking, highlighting the importance of morphology for tackling long-tail phenomena in language understanding tasks.
Anthology ID:
P17-1006
Volume:
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2017
Address:
Vancouver, Canada
Editors:
Regina Barzilay, Min-Yen Kan
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
56–68
Language:
URL:
https://aclanthology.org/P17-1006
DOI:
10.18653/v1/P17-1006
Bibkey:
Cite (ACL):
Ivan Vulić, Nikola Mrkšić, Roi Reichart, Diarmuid Ó Séaghdha, Steve Young, and Anna Korhonen. 2017. Morph-fitting: Fine-Tuning Word Vector Spaces with Simple Language-Specific Rules. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 56–68, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Morph-fitting: Fine-Tuning Word Vector Spaces with Simple Language-Specific Rules (Vulić et al., ACL 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-bitext-workshop/P17-1006.pdf
Note:
 P17-1006.Notes.zip
Video:
 https://preview.aclanthology.org/ingest-bitext-workshop/P17-1006.mp4