DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation–Maximization and Chunk-based Language Model

Ondřej Herman; Vit Suchomel; Vít Baisa; Pavel Rychlý

DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation–Maximization and Chunk-based Language Model

Ondřej Herman, Vít Suchomel, Vít Baisa, Pavel Rychlý

Abstract

In this paper we investigate two approaches to discrimination of similar languages: Expectation–maximization algorithm for estimating conditional probability P(word|language) and byte level language models similar to compression-based language modelling methods. The accuracy of these methods reached respectively 86.6% and 88.3% on set A of the DSL Shared task 2016 competition.

Anthology ID:: W16-4815
Volume:: Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)
Month:: December
Year:: 2016
Address:: Osaka, Japan
Editors:: Preslav Nakov, Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi
Venue:: VarDial
SIG:
Publisher:: The COLING 2016 Organizing Committee
Note:
Pages:: 114–118
Language:
URL:: https://aclanthology.org/W16-4815
DOI:
Bibkey:
Cite (ACL):: Ondřej Herman, Vít Suchomel, Vít Baisa, and Pavel Rychlý. 2016. DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation–Maximization and Chunk-based Language Model. In Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3), pages 114–118, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):: DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation–Maximization and Chunk-based Language Model (Herman et al., VarDial 2016)
Copy Citation:
PDF:: https://preview.aclanthology.org/dois-2013-emnlp/W16-4815.pdf

PDF Search