Abstract
In this paper we present a newly developed tool that enables researchers interested in spatial variation of language to define a geographic perimeter of interest, collect data from the Twitter streaming API published in that perimeter, filter the obtained data by language and country, define and extract variables of interest and analyse the extracted variables by one spatial statistic and two spatial visualisations. We showcase the tool on the area and a selection of languages spoken in former Yugoslavia. By defining the perimeter, languages and a series of linguistic variables of interest we demonstrate the data collection, processing and analysis capabilities of the tool.- Anthology ID:
- C16-1322
- Volume:
- Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
- Month:
- December
- Year:
- 2016
- Address:
- Osaka, Japan
- Venue:
- COLING
- SIG:
- Publisher:
- The COLING 2016 Organizing Committee
- Note:
- Pages:
- 3412–3421
- Language:
- URL:
- https://aclanthology.org/C16-1322
- DOI:
- Cite (ACL):
- Nikola Ljubešić, Tanja Samardžić, and Curdin Derungs. 2016. TweetGeo - A Tool for Collecting, Processing and Analysing Geo-encoded Linguistic Data. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 3412–3421, Osaka, Japan. The COLING 2016 Organizing Committee.
- Cite (Informal):
- TweetGeo - A Tool for Collecting, Processing and Analysing Geo-encoded Linguistic Data (Ljubešić et al., COLING 2016)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/C16-1322.pdf