Abstract
This paper describes the Helsinki–Ljubljana contribution to the VarDial 2021 shared task on social media variety geolocation. Following our successful participation at VarDial 2020, we again propose constrained and unconstrained systems based on the BERT architecture. In this paper, we report experiments with different tokenization settings and different pre-trained models, and we contrast our parameter-free regression approach with various classification schemes proposed by other participants at VarDial 2020. Both the code and the best-performing pre-trained models are made freely available.- Anthology ID:
- 2021.vardial-1.16
- Volume:
- Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects
- Month:
- April
- Year:
- 2021
- Address:
- Kiyv, Ukraine
- Venue:
- VarDial
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 135–140
- Language:
- URL:
- https://aclanthology.org/2021.vardial-1.16
- DOI:
- Cite (ACL):
- Yves Scherrer and Nikola Ljubešić. 2021. Social Media Variety Geolocation with geoBERT. In Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 135–140, Kiyv, Ukraine. Association for Computational Linguistics.
- Cite (Informal):
- Social Media Variety Geolocation with geoBERT (Scherrer & Ljubešić, VarDial 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.vardial-1.16.pdf