Arukikata Travelogue Dataset with Geographic Entity Mention, Coreference, and Link Annotation

Shohei Higashiyama, Hiroki Ouchi, Hiroki Teranishi, Hiroyuki Otomo, Yusuke Ide, Aitaro Yamamoto, Hiroyuki Shindo, Yuki Matsuda, Shoko Wakamiya, Naoya Inoue, Ikuya Yamada, Taro Watanabe


Abstract
Geoparsing is a fundamental technique for analyzing geo-entity information in text, which is useful for geographic applications, e.g., tourist spot recommendation. We focus on document-level geoparsing that considers geographic relatedness among geo-entity mentions and present a Japanese travelogue dataset designed for training and evaluating document-level geoparsing systems. Our dataset comprises 200 travelogue documents with rich geo-entity information: 12,171 mentions, 6,339 coreference clusters, and 2,551 geo-entities linked to geo-database entries.
Anthology ID:
2024.findings-eacl.35
Volume:
Findings of the Association for Computational Linguistics: EACL 2024
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Yvette Graham, Matthew Purver
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
513–532
Language:
URL:
https://aclanthology.org/2024.findings-eacl.35
DOI:
Bibkey:
Cite (ACL):
Shohei Higashiyama, Hiroki Ouchi, Hiroki Teranishi, Hiroyuki Otomo, Yusuke Ide, Aitaro Yamamoto, Hiroyuki Shindo, Yuki Matsuda, Shoko Wakamiya, Naoya Inoue, Ikuya Yamada, and Taro Watanabe. 2024. Arukikata Travelogue Dataset with Geographic Entity Mention, Coreference, and Link Annotation. In Findings of the Association for Computational Linguistics: EACL 2024, pages 513–532, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
Arukikata Travelogue Dataset with Geographic Entity Mention, Coreference, and Link Annotation (Higashiyama et al., Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/2024.findings-eacl.35.pdf