Fine-Grained Geographical Relation Extraction from Wikipedia

Andre Blessing, Hinrich Schütze


Abstract
In this paper, we present work on enhancing the basic data resource of a context-aware system. Electronic text offers a wealth of information about geospatial data and can be used to improve the completeness and accuracy of geospatial resources (e.g., gazetteers). First, we introduce a supervised approach to extracting geographical relations on a fine-grained level. Second, we present a novel way of using Wikipedia as a corpus based on self-annotation. A self-annotation is an automatically created high-quality annotation that can be used for training and evaluation. Wikipedia contains two types of different context: (i) unstructured text and (ii) structured data: templates (e.g., infoboxes about cities), lists and tables. We use the structured data to annotate the unstructured text. Finally, the extracted fine-grained relations are used to complete gazetteer data. The precision and recall scores of more than 97 percent confirm that a statistical IE pipeline can be used to improve the data quality of community-based resources.
Anthology ID:
L10-1358
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/519_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Andre Blessing and Hinrich Schütze. 2010. Fine-Grained Geographical Relation Extraction from Wikipedia. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
Fine-Grained Geographical Relation Extraction from Wikipedia (Blessing & Schütze, LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/519_Paper.pdf