Abstract
Recent work in geolocation has made several hypotheses about what linguistic markers are relevant to detect where people write from. In this paper, we examine six hypotheses against a corpus consisting of all geo-tagged tweets from the US, or whose geo-tags could be inferred, in a 19% sample of Twitter history. Our experiments lend support to all six hypotheses, including that spelling variants and hashtags are strong predictors of location. We also study what kinds of common nouns are predictive of location after controlling for named entities such as dolphins or sharks- Anthology ID:
- W17-4409
- Volume:
- Proceedings of the 3rd Workshop on Noisy User-generated Text
- Month:
- September
- Year:
- 2017
- Address:
- Copenhagen, Denmark
- Editors:
- Leon Derczynski, Wei Xu, Alan Ritter, Tim Baldwin
- Venue:
- WNUT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 62–67
- Language:
- URL:
- https://aclanthology.org/W17-4409
- DOI:
- 10.18653/v1/W17-4409
- Cite (ACL):
- Bahar Salehi and Anders Søgaard. 2017. Evaluating hypotheses in geolocation on a very large sample of Twitter. In Proceedings of the 3rd Workshop on Noisy User-generated Text, pages 62–67, Copenhagen, Denmark. Association for Computational Linguistics.
- Cite (Informal):
- Evaluating hypotheses in geolocation on a very large sample of Twitter (Salehi & Søgaard, WNUT 2017)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/W17-4409.pdf