Arukikata Travelogue Dataset with Geographic Entity Mention, Coreference, and Link Annotation
Shohei Higashiyama, Hiroki Ouchi, Hiroki Teranishi, Hiroyuki Otomo, Yusuke Ide, Aitaro Yamamoto, Hiroyuki Shindo, Yuki Matsuda, Shoko Wakamiya, Naoya Inoue, Ikuya Yamada, Taro Watanabe
Abstract
Geoparsing is a fundamental technique for analyzing geo-entity information in text, which is useful for geographic applications, e.g., tourist spot recommendation. We focus on document-level geoparsing that considers geographic relatedness among geo-entity mentions and present a Japanese travelogue dataset designed for training and evaluating document-level geoparsing systems. Our dataset comprises 200 travelogue documents with rich geo-entity information: 12,171 mentions, 6,339 coreference clusters, and 2,551 geo-entities linked to geo-database entries.- Anthology ID:
- 2024.findings-eacl.35
- Volume:
- Findings of the Association for Computational Linguistics: EACL 2024
- Month:
- March
- Year:
- 2024
- Address:
- St. Julian’s, Malta
- Editors:
- Yvette Graham, Matthew Purver
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 513–532
- Language:
- URL:
- https://aclanthology.org/2024.findings-eacl.35
- DOI:
- Cite (ACL):
- Shohei Higashiyama, Hiroki Ouchi, Hiroki Teranishi, Hiroyuki Otomo, Yusuke Ide, Aitaro Yamamoto, Hiroyuki Shindo, Yuki Matsuda, Shoko Wakamiya, Naoya Inoue, Ikuya Yamada, and Taro Watanabe. 2024. Arukikata Travelogue Dataset with Geographic Entity Mention, Coreference, and Link Annotation. In Findings of the Association for Computational Linguistics: EACL 2024, pages 513–532, St. Julian’s, Malta. Association for Computational Linguistics.
- Cite (Informal):
- Arukikata Travelogue Dataset with Geographic Entity Mention, Coreference, and Link Annotation (Higashiyama et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2024.findings-eacl.35.pdf