VISTA: Visual-Textual Knowledge Graph Representation Learning

Jaejun Lee; Chanyoung Chung; Hochang Lee; Sungho Jo; Joyce Whang

doi:10.18653/v1/2023.findings-emnlp.488

VISTA: Visual-Textual Knowledge Graph Representation Learning

Jaejun Lee, Chanyoung Chung, Hochang Lee, Sungho Jo, Joyce Whang

Abstract

Knowledge graphs represent human knowledge using triplets composed of entities and relations. While most existing knowledge graph embedding methods only consider the structure of a knowledge graph, a few recently proposed multimodal methods utilize images or text descriptions of entities in a knowledge graph. In this paper, we propose visual-textual knowledge graphs (VTKGs), where not only entities but also triplets can be explained using images, and both entities and relations can accompany text descriptions. By compiling visually expressible commonsense knowledge, we construct new benchmark datasets where triplets themselves are explained by images, and the meanings of entities and relations are described using text. We propose VISTA, a knowledge graph representation learning method for VTKGs, which incorporates the visual and textual representations of entities and relations using entity encoding, relation encoding, and triplet decoding transformers. Experiments show that VISTA outperforms state-of-the-art knowledge graph completion methods in real-world VTKGs.

Anthology ID:: 2023.findings-emnlp.488
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2023
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7314–7328
Language:
URL:: https://aclanthology.org/2023.findings-emnlp.488
DOI:: 10.18653/v1/2023.findings-emnlp.488
Bibkey:
Cite (ACL):: Jaejun Lee, Chanyoung Chung, Hochang Lee, Sungho Jo, and Joyce Whang. 2023. VISTA: Visual-Textual Knowledge Graph Representation Learning. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7314–7328, Singapore. Association for Computational Linguistics.
Cite (Informal):: VISTA: Visual-Textual Knowledge Graph Representation Learning (Lee et al., Findings 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-2/2023.findings-emnlp.488.pdf

PDF Search