Computationally efficient discrimination between language varieties with large feature vectors and regularized classifiers

Adrien Barbaresi


Abstract
The present contribution revolves around efficient approaches to language classification which have been field-tested in the Vardial evaluation campaign. The methods used in several language identification tasks comprising different language types are presented and their results are discussed, giving insights on real-world application of regularization, linear classifiers and corresponding linguistic features. The use of a specially adapted Ridge classifier proved useful in 2 tasks out of 3. The overall approach (XAC) has slightly outperformed most of the other systems on the DFS task (Dutch and Flemish) and on the ILI task (Indo-Aryan languages), while its comparative performance was poorer in on the GDI task (Swiss German dialects).
Anthology ID:
W18-3918
Volume:
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi, Ahmed Ali
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
164–171
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/W18-3918/
DOI:
Bibkey:
Cite (ACL):
Adrien Barbaresi. 2018. Computationally efficient discrimination between language varieties with large feature vectors and regularized classifiers. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), pages 164–171, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Computationally efficient discrimination between language varieties with large feature vectors and regularized classifiers (Barbaresi, VarDial 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/W18-3918.pdf