Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task

Shervin Malmasi, Marcos Zampieri, Nikola Ljubešić, Preslav Nakov, Ahmed Ali, Jörg Tiedemann


Abstract
We present the results of the third edition of the Discriminating between Similar Languages (DSL) shared task, which was organized as part of the VarDial’2016 workshop at COLING’2016. The challenge offered two subtasks: subtask 1 focused on the identification of very similar languages and language varieties in newswire texts, whereas subtask 2 dealt with Arabic dialect identification in speech transcripts. A total of 37 teams registered to participate in the task, 24 teams submitted test results, and 20 teams also wrote system description papers. High-order character n-grams were the most successful feature, and the best classification approaches included traditional supervised learning methods such as SVM, logistic regression, and language models, while deep learning approaches did not perform very well.
Anthology ID:
W16-4801
Volume:
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)
Month:
December
Year:
2016
Address:
Osaka, Japan
Venue:
VarDial
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
1–14
Language:
URL:
https://aclanthology.org/W16-4801
DOI:
Bibkey:
Cite (ACL):
Shervin Malmasi, Marcos Zampieri, Nikola Ljubešić, Preslav Nakov, Ahmed Ali, and Jörg Tiedemann. 2016. Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task. In Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3), pages 1–14, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task (Malmasi et al., VarDial 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/W16-4801.pdf