Abstract
Independent Component Analysis (ICA) offers interpretable semantic components of embeddings.While ICA theory assumes that embeddings can be linearly decomposed into independent components, real-world data often do not satisfy this assumption. Consequently, non-independencies remain between the estimated components, which ICA cannot eliminate. We quantified these non-independencies using higher-order correlations and demonstrated that when the higher-order correlation between two components is large, it indicates a strong semantic association between them, along with many words sharing common meanings with both components. The entire structure of non-independencies was visualized using a maximum spanning tree of semantic components. These findings provide deeper insights into embeddings through ICA.- Anthology ID:
- 2024.emnlp-main.169
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2883–2899
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/2024.emnlp-main.169/
- DOI:
- 10.18653/v1/2024.emnlp-main.169
- Cite (ACL):
- Momose Oyama, Hiroaki Yamagiwa, and Hidetoshi Shimodaira. 2024. Understanding Higher-Order Correlations Among Semantic Components in Embeddings. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 2883–2899, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Understanding Higher-Order Correlations Among Semantic Components in Embeddings (Oyama et al., EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2024.emnlp-main.169.pdf