Watset: Automatic Induction of Synsets from a Graph of Synonyms

Dmitry Ustalov, Alexander Panchenko, Chris Biemann


Abstract
This paper presents a new graph-based approach that induces synsets using synonymy dictionaries and word embeddings. First, we build a weighted graph of synonyms extracted from commonly available resources, such as Wiktionary. Second, we apply word sense induction to deal with ambiguous words. Finally, we cluster the disambiguated version of the ambiguous input graph into synsets. Our meta-clustering approach lets us use an efficient hard clustering algorithm to perform a fuzzy clustering of the graph. Despite its simplicity, our approach shows excellent results, outperforming five competitive state-of-the-art methods in terms of F-score on three gold standard datasets for English and Russian derived from large-scale manually constructed lexical resources.
Anthology ID:
P17-1145
Volume:
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2017
Address:
Vancouver, Canada
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1579–1590
Language:
URL:
https://aclanthology.org/P17-1145
DOI:
10.18653/v1/P17-1145
Bibkey:
Cite (ACL):
Dmitry Ustalov, Alexander Panchenko, and Chris Biemann. 2017. Watset: Automatic Induction of Synsets from a Graph of Synonyms. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1579–1590, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Watset: Automatic Induction of Synsets from a Graph of Synonyms (Ustalov et al., ACL 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/P17-1145.pdf
Code
 dustalov/watset