Toponym Detection in the Bio-Medical Domain: A Hybrid Approach with Deep Learning

Alistair Plum, Tharindu Ranasinghe, Constantin Orasan


Abstract
This paper compares how different machine learning classifiers can be used together with simple string matching and named entity recognition to detect locations in texts. We compare five different state-of-the-art machine learning classifiers in order to predict whether a sentence contains a location or not. Following this classification task, we use a string matching algorithm with a gazetteer to identify the exact index of a toponym within the sentence. We evaluate different approaches in terms of machine learning classifiers, text pre-processing and location extraction on the SemEval-2019 Task 12 dataset, compiled for toponym resolution in the bio-medical domain. Finally, we compare the results with our system that was previously submitted to the SemEval-2019 task evaluation.
Anthology ID:
R19-1106
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
912–921
Language:
URL:
https://aclanthology.org/R19-1106
DOI:
10.26615/978-954-452-056-4_106
Bibkey:
Cite (ACL):
Alistair Plum, Tharindu Ranasinghe, and Constantin Orasan. 2019. Toponym Detection in the Bio-Medical Domain: A Hybrid Approach with Deep Learning. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 912–921, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Toponym Detection in the Bio-Medical Domain: A Hybrid Approach with Deep Learning (Plum et al., RANLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/R19-1106.pdf