A Practice of Tourism Knowledge Graph Construction based on Heterogeneous Information
Dinghe Xiao, Nannan Wang, Jiangang Yu, Chunhong Zhang, Jiaqi Wu
Abstract
The increasing amount of semi-structured and unstructured data on tourism websites brings a need for information extraction (IE) so as to construct a Tourism-domain Knowledge Graph (TKG), which is helpful to manage tourism information and develop downstream applications such as tourism search engine, recommendation and Q & A. However, the existing TKG is deficient, and there are few open methods to promote the construction and widespread application of TKG. In this paper, we present a systematic framework to build a TKG for Hainan, collecting data from popular tourism websites and structuring it into triples. The data is multi-source and heterogeneous, which raises a great challenge for processing it. So we develop two pipelines of processing methods for semi-structured data and unstructured data respectively. We refer to tourism InfoBox for semi-structured knowledge extraction and leverage deep learning algorithms to extract entities and relations from unstructured travel notes, which are colloquial and high-noise, and then we fuse the extracted knowledge from two sources. Finally, a TKG with 13 entity types and 46 relation types is established, which totally contains 34,079 entities and 441,371 triples. The systematic procedure proposed by this paper can construct a TKG from tourism websites, which can further applied to many scenarios and provide detailed reference for the construction of other domain-specific knowledge graphs.- Anthology ID:
- 2020.ccl-1.87
- Volume:
- Proceedings of the 19th Chinese National Conference on Computational Linguistics
- Month:
- October
- Year:
- 2020
- Address:
- Haikou, China
- Editors:
- Maosong Sun (孙茂松), Sujian Li (李素建), Yue Zhang (张岳), Yang Liu (刘洋)
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 939–949
- Language:
- English
- URL:
- https://aclanthology.org/2020.ccl-1.87
- DOI:
- Cite (ACL):
- Dinghe Xiao, Nannan Wang, Jiangang Yu, Chunhong Zhang, and Jiaqi Wu. 2020. A Practice of Tourism Knowledge Graph Construction based on Heterogeneous Information. In Proceedings of the 19th Chinese National Conference on Computational Linguistics, pages 939–949, Haikou, China. Chinese Information Processing Society of China.
- Cite (Informal):
- A Practice of Tourism Knowledge Graph Construction based on Heterogeneous Information (Xiao et al., CCL 2020)
- PDF:
- https://preview.aclanthology.org/naacl-24-ws-corrections/2020.ccl-1.87.pdf