Abstract
This paper designs a Monolingual Lexicon Induction task and observes that two factors accompany the degraded accuracy of bilingual lexicon induction for rare words. First, a diminishing margin between similarities in low frequency regime, and secondly, exacerbated hubness at low frequency. Based on the observation, we further propose two methods to address these two factors, respectively. The larger issue is hubness. Addressing that improves induction accuracy significantly, especially for low-frequency words.- Anthology ID:
- 2020.emnlp-main.100
- Volume:
- Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1310–1314
- Language:
- URL:
- https://aclanthology.org/2020.emnlp-main.100
- DOI:
- 10.18653/v1/2020.emnlp-main.100
- Cite (ACL):
- Jiaji Huang, Xingyu Cai, and Kenneth Church. 2020. Improving Bilingual Lexicon Induction for Low Frequency Words. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1310–1314, Online. Association for Computational Linguistics.
- Cite (Informal):
- Improving Bilingual Lexicon Induction for Low Frequency Words (Huang et al., EMNLP 2020)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2020.emnlp-main.100.pdf