LILI: A Simple Language Independent Approach for Language Identification

Mohamed Al-Badrashiny; Mona Diab

LILI: A Simple Language Independent Approach for Language Identification

Abstract

We introduce a generic Language Independent Framework for Linguistic Code Switch Point Detection. The system uses characters level 5-grams and word level unigram language models to train a conditional random fields (CRF) model for classifying input words into various languages. We test our proposed framework and compare it to the state-of-the-art published systems on standard data sets from several language pairs: English-Spanish, Nepali-English, English-Hindi, Arabizi (Refers to Arabic written using the Latin/Roman script)-English, Arabic-Engari (Refers to English written using Arabic script), Modern Standard Arabic(MSA)-Egyptian, Levantine-MSA, Gulf-MSA, one more English-Spanish, and one more MSA-EGY. The overall weighted average F-score of each language pair are 96.4%, 97.3%, 98.0%, 97.0%, 98.9%, 86.3%, 88.2%, 90.6%, 95.2%, and 85.0% respectively. The results show that our approach despite its simplicity, either outperforms or performs at comparable levels to state-of-the-art published systems.

Anthology ID:: C16-1115
Volume:: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:: December
Year:: 2016
Address:: Osaka, Japan
Editors:: Yuji Matsumoto, Rashmi Prasad
Venue:: COLING
SIG:
Publisher:: The COLING 2016 Organizing Committee
Note:
Pages:: 1211–1219
Language:
URL:: https://preview.aclanthology.org/add_missing_videos/C16-1115/
DOI:
Bibkey:
Cite (ACL):: Mohamed Al-Badrashiny and Mona Diab. 2016. LILI: A Simple Language Independent Approach for Language Identification. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 1211–1219, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):: LILI: A Simple Language Independent Approach for Language Identification (Al-Badrashiny & Diab, COLING 2016)
Copy Citation:
PDF:: https://preview.aclanthology.org/add_missing_videos/C16-1115.pdf

PDF Search Fix data