From Text to Maps: LLM-Driven Extraction and Geotagging of Epidemiological Data

Karlyn K. Harrod, Prabin Bhandari, Antonios Anastasopoulos


Abstract
Epidemiological datasets are essential for public health analysis and decision-making, yet they remain scarce and often difficult to compile due to inconsistent data formats, language barriers, and evolving political boundaries. Traditional methods of creating such datasets involve extensive manual effort and are prone to errors in accurate location extraction. To address these challenges, we propose utilizing large language models (LLMs) to automate the extraction and geotagging of epidemiological data from textual documents. Our approach significantly reduces the manual effort required, limiting human intervention to validating a subset of records against text snippets and verifying the geotagging reasoning, as opposed to reviewing multiple entire documents manually to extract, clean, and geotag. Additionally, the LLMs identify information often overlooked by human annotators, further enhancing the dataset’s completeness. Our findings demonstrate that LLMs can be effectively used to semi-automate the extraction and geotagging of epidemiological data, offering several key advantages: (1) comprehensive information extraction with minimal risk of missing critical details; (2) minimal human intervention; (3) higher-resolution data with more precise geotagging; and (4) significantly reduced resource demands compared to traditional methods.
Anthology ID:
2024.nlp4pi-1.24
Volume:
Proceedings of the Third Workshop on NLP for Positive Impact
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Daryna Dementieva, Oana Ignat, Zhijing Jin, Rada Mihalcea, Giorgio Piatti, Joel Tetreault, Steven Wilson, Jieyu Zhao
Venue:
NLP4PI
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
258–270
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/2024.nlp4pi-1.24/
DOI:
10.18653/v1/2024.nlp4pi-1.24
Bibkey:
Cite (ACL):
Karlyn K. Harrod, Prabin Bhandari, and Antonios Anastasopoulos. 2024. From Text to Maps: LLM-Driven Extraction and Geotagging of Epidemiological Data. In Proceedings of the Third Workshop on NLP for Positive Impact, pages 258–270, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
From Text to Maps: LLM-Driven Extraction and Geotagging of Epidemiological Data (Harrod et al., NLP4PI 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/2024.nlp4pi-1.24.pdf