Geotagging a Diachronic Corpus of Alpine Texts: Comparing Distinct Approaches to Toponym Recognition

Tannon Kew, Anastassia Shaitarova, Isabel Meraner, Janis Goldzycher, Simon Clematide, Martin Volk

[How to correct problems with metadata yourself]


Abstract
Geotagging historic and cultural texts provides valuable access to heritage data, enabling location-based searching and new geographically related discoveries. In this paper, we describe two distinct approaches to geotagging a variety of fine-grained toponyms in a diachronic corpus of alpine texts. By applying a traditional gazetteer-based approach, aided by a few simple heuristics, we attain strong high-precision annotations. Using the output of this earlier system, we adopt a state-of-the-art neural approach in order to facilitate the detection of new toponyms on the basis of context. Additionally, we present the results of preliminary experiments on integrating a small amount of crowdsourced annotations to improve overall performance of toponym recognition in our heritage corpus.
Anthology ID:
W19-9003
Volume:
Proceedings of the Workshop on Language Technology for Digital Historical Archives
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Editors:
Cristina Vertan, Petya Osenova, Dimitar Iliev
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
11–18
Language:
URL:
https://aclanthology.org/W19-9003
DOI:
10.26615/978-954-452-059-5_003
Bibkey:
Cite (ACL):
Tannon Kew, Anastassia Shaitarova, Isabel Meraner, Janis Goldzycher, Simon Clematide, and Martin Volk. 2019. Geotagging a Diachronic Corpus of Alpine Texts: Comparing Distinct Approaches to Toponym Recognition. In Proceedings of the Workshop on Language Technology for Digital Historical Archives, pages 11–18, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Geotagging a Diachronic Corpus of Alpine Texts: Comparing Distinct Approaches to Toponym Recognition (Kew et al., RANLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/teach-a-man-to-fish/W19-9003.pdf