Abstract
Bilingual Lexicon Induction (BLI) is the task of translating words from corpora in two languages. Recent advances in BLI work by aligning the two word embedding spaces. Following that, a key step is to retrieve the nearest neighbor (NN) in the target space given the source word. However, a phenomenon called hubness often degrades the accuracy of NN. Hubness appears as some data points, called hubs, being extra-ordinarily close to many of the other data points. Reducing hubness is necessary for retrieval tasks. One successful example is Inverted SoFtmax (ISF), recently proposed to improve NN. This work proposes a new method, Hubless Nearest Neighbor (HNN), to mitigate hubness. HNN differs from NN by imposing an additional equal preference assumption. Moreover, the HNN formulation explains why ISF works as well as it does. Empirical results demonstrate that HNN outperforms NN, ISF and other state-of-the-art. For reproducibility and follow-ups, we have published all code.- Anthology ID:
- P19-1399
- Volume:
- Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
- Month:
- July
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Anna Korhonen, David Traum, Lluís Màrquez
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4072–4080
- Language:
- URL:
- https://aclanthology.org/P19-1399
- DOI:
- 10.18653/v1/P19-1399
- Cite (ACL):
- Jiaji Huang, Qiang Qiu, and Kenneth Church. 2019. Hubless Nearest Neighbor Search for Bilingual Lexicon Induction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4072–4080, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Hubless Nearest Neighbor Search for Bilingual Lexicon Induction (Huang et al., ACL 2019)
- PDF:
- https://preview.aclanthology.org/add_acl24_videos/P19-1399.pdf
- Code
- baidu-research/HNN