N-gram and Neural Language Models for Discriminating Similar Languages

Andre Cianflone, Leila Kosseim


Abstract
This paper describes our submission to the 2016 Discriminating Similar Languages (DSL) Shared Task. We participated in the closed Sub-task 1 with two separate machine learning techniques. The first approach is a character based Convolution Neural Network with an LSTM layer (CLSTM), which achieved an accuracy of 78.45% with minimal tuning. The second approach is a character-based n-gram model of size 7. It achieved an accuracy of 88.45% which is close to the accuracy of 89.38% achieved by the best submission.
Anthology ID:
W16-4831
Volume:
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Preslav Nakov, Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi
Venue:
VarDial
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
243–250
Language:
URL:
https://aclanthology.org/W16-4831
DOI:
Bibkey:
Cite (ACL):
Andre Cianflone and Leila Kosseim. 2016. N-gram and Neural Language Models for Discriminating Similar Languages. In Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3), pages 243–250, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
N-gram and Neural Language Models for Discriminating Similar Languages (Cianflone & Kosseim, VarDial 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/ml4al-ingestion/W16-4831.pdf