Automating the Generation of a Functional Semantic Types Ontology with Foundational Models

Sachin Konan, Larry Rudolph, Scott Affens


Abstract
The rise of data science, the inherent dirtiness of data, and the proliferation of vast data providers have increased the value proposition of Semantic Types. Semantic Types are a way of encoding contextual information onto a data schema that informs the user about the definitional meaning of data, its broader context, and relationships to other types. We increasingly see a world where providing structure to this information, attached directly to data, will enable both people and systems to better understand the content of a dataset and the ability to efficiently automate data tasks such as validation, mapping/joins, and eventually machine learning. While ontological systems exist, they have not had widespread adoption due to challenges in mapping to operational datasets and lack of specificity of entity-types. Additionally, the validation checks associated with data are stored in code bases separate from the datasets that are distributed. In this paper, we address both challenges holistically by proposing a system that efficiently maps and encodes functional meaning on Semantic Types.
Anthology ID:
2024.naacl-industry.21
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Yi Yang, Aida Davani, Avi Sil, Anoop Kumar
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
248–265
Language:
URL:
https://aclanthology.org/2024.naacl-industry.21
DOI:
Bibkey:
Cite (ACL):
Sachin Konan, Larry Rudolph, and Scott Affens. 2024. Automating the Generation of a Functional Semantic Types Ontology with Foundational Models. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track), pages 248–265, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Automating the Generation of a Functional Semantic Types Ontology with Foundational Models (Konan et al., NAACL 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.naacl-industry.21.pdf