Word Rotator’s Distance

Sho Yokoi, Ryo Takahashi, Reina Akama, Jun Suzuki, Kentaro Inui


Abstract
One key principle for assessing textual similarity is measuring the degree of semantic overlap between texts by considering the word alignment. Such alignment-based approaches are both intuitive and interpretable; however, they are empirically inferior to the simple cosine similarity between general-purpose sentence vectors. We focus on the fact that the norm of word vectors is a good proxy for word importance, and the angle of them is a good proxy for word similarity. However, alignment-based approaches do not distinguish the norm and direction, whereas sentence-vector approaches automatically use the norm as the word importance. Accordingly, we propose decoupling word vectors into their norm and direction then computing the alignment-based similarity with the help of earth mover’s distance (optimal transport), which we refer to as word rotator’s distance. Furthermore, we demonstrate how to grow the norm and direction of word vectors (vector converter); this is a new systematic approach derived from the sentence-vector estimation methods, which can significantly improve the performance of the proposed method. On several STS benchmarks, the proposed methods outperform not only alignment-based approaches but also strong baselines. The source code is avaliable at https://github.com/eumesy/wrd
Anthology ID:
2020.emnlp-main.236
Original:
2020.emnlp-main.236v1
Version 2:
2020.emnlp-main.236v2
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2944–2960
Language:
URL:
https://aclanthology.org/2020.emnlp-main.236
DOI:
10.18653/v1/2020.emnlp-main.236
Bibkey:
Cite (ACL):
Sho Yokoi, Ryo Takahashi, Reina Akama, Jun Suzuki, and Kentaro Inui. 2020. Word Rotator’s Distance. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2944–2960, Online. Association for Computational Linguistics.
Cite (Informal):
Word Rotator’s Distance (Yokoi et al., EMNLP 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/paclic-22-ingestion/2020.emnlp-main.236.pdf
Video:
 https://slideslive.com/38939100
Code
 eumesy/wrd