TweetGeo - A Tool for Collecting, Processing and Analysing Geo-encoded Linguistic Data

Nikola Ljubešić, Tanja Samardžić, Curdin Derungs


Abstract
In this paper we present a newly developed tool that enables researchers interested in spatial variation of language to define a geographic perimeter of interest, collect data from the Twitter streaming API published in that perimeter, filter the obtained data by language and country, define and extract variables of interest and analyse the extracted variables by one spatial statistic and two spatial visualisations. We showcase the tool on the area and a selection of languages spoken in former Yugoslavia. By defining the perimeter, languages and a series of linguistic variables of interest we demonstrate the data collection, processing and analysis capabilities of the tool.
Anthology ID:
C16-1322
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
3412–3421
Language:
URL:
https://aclanthology.org/C16-1322
DOI:
Bibkey:
Cite (ACL):
Nikola Ljubešić, Tanja Samardžić, and Curdin Derungs. 2016. TweetGeo - A Tool for Collecting, Processing and Analysing Geo-encoded Linguistic Data. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 3412–3421, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
TweetGeo - A Tool for Collecting, Processing and Analysing Geo-encoded Linguistic Data (Ljubešić et al., COLING 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/C16-1322.pdf