Probing Taxonomic and Thematic Embeddings for Taxonomic Information

Filip Klubička, John Kelleher


Abstract
Modelling taxonomic and thematic relatedness is important for building AI with comprehensive natural language understanding. The goal of this paper is to learn more about how taxonomic information is structurally encoded in embeddings. To do this, we design a new hypernym-hyponym probing task and perform a comparative probing study of taxonomic and thematic SGNS and GloVe embeddings. Our experiments indicate that both types of embeddings encode some taxonomic information, but the amount, as well as the geometric properties of the encodings, are independently related to both the encoder architecture, as well as the embedding training data. Specifically, we find that only taxonomic embeddings carry taxonomic information in their norm, which is determined by the underlying distribution in the data.
Anthology ID:
2023.gwc-1.1
Volume:
Proceedings of the 12th Global Wordnet Conference
Month:
January
Year:
2023
Address:
University of the Basque Country, Donostia - San Sebastian, Basque Country
Editors:
German Rigau, Francis Bond, Alexandre Rademaker
Venue:
GWC
SIG:
Publisher:
Global Wordnet Association
Note:
Pages:
1–13
Language:
URL:
https://aclanthology.org/2023.gwc-1.1
DOI:
Bibkey:
Cite (ACL):
Filip Klubička and John Kelleher. 2023. Probing Taxonomic and Thematic Embeddings for Taxonomic Information. In Proceedings of the 12th Global Wordnet Conference, pages 1–13, University of the Basque Country, Donostia - San Sebastian, Basque Country. Global Wordnet Association.
Cite (Informal):
Probing Taxonomic and Thematic Embeddings for Taxonomic Information (Klubička & Kelleher, GWC 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2023.gwc-1.1.pdf