Variance Matters: Detecting Semantic Differences without Corpus/Word Alignment
Ryo Nagata, Hiroya Takamura, Naoki Otani, Yoshifumi Kawasaki
Abstract
In this paper, we propose methods for discovering semantic differences in words appearing in two corpora. The key idea is to measure the coverage of meanings of a word in a corpus through the norm of its mean word vector, which is equivalent to examining a kind of variance of the word vector distribution. The proposed methods do not require alignments between words and/or corpora for comparison that previous methods do. All they require are to compute variance (or norms of mean word vectors) for each word type. Nevertheless, they rival the best-performing system in the SemEval-2020 Task 1. In addition, they are (i) robust for the skew in corpus sizes; (ii) capable of detecting semantic differences in infrequent words; and (iii) effective in pinpointing word instances that have a meaning missing in one of the two corpora under comparison. We show these advantages for historical corpora and also for native/non-native English corpora.- Anthology ID:
- 2023.emnlp-main.965
- Volume:
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 15609–15622
- Language:
- URL:
- https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2023.emnlp-main.965/
- DOI:
- 10.18653/v1/2023.emnlp-main.965
- Cite (ACL):
- Ryo Nagata, Hiroya Takamura, Naoki Otani, and Yoshifumi Kawasaki. 2023. Variance Matters: Detecting Semantic Differences without Corpus/Word Alignment. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15609–15622, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Variance Matters: Detecting Semantic Differences without Corpus/Word Alignment (Nagata et al., EMNLP 2023)
- PDF:
- https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2023.emnlp-main.965.pdf