BoNC: Bag of N-Characters Model for Word Level Language Identification

Shimaa Ismail; Mai K. Gallab; Hamada Nayel

BoNC: Bag of N-Characters Model for Word Level Language Identification

Shimaa Ismail, Mai K. Gallab, Hamada Nayel

Abstract

This paper describes the model submitted by NLP_BFCAI team for Kanglish shared task held at ICON 2022. The proposed model used a very simple approach based on the word representation. Simple machine learning classification algorithms, Random Forests, Support Vector Machines, Stochastic Gradient Descent and Multi-Layer Perceptron have been imple- mented. Our submission, RF, securely ranked fifth among all other submissions.

Anthology ID:: 2022.icon-wlli.7
Volume:: Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts
Month:: December
Year:: 2022
Address:: IIIT Delhi, New Delhi, India
Editors:: Bharathi Raja Chakravarthi, Abirami Murugappan, Dhivya Chinnappa, Adeep Hane, Prasanna Kumar Kumeresan, Rahul Ponnusamy
Venue:: ICON
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 34–37
Language:
URL:: https://aclanthology.org/2022.icon-wlli.7
DOI:
Bibkey:
Cite (ACL):: Shimaa Ismail, Mai K. Gallab, and Hamada Nayel. 2022. BoNC: Bag of N-Characters Model for Word Level Language Identification. In Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts, pages 34–37, IIIT Delhi, New Delhi, India. Association for Computational Linguistics.
Cite (Informal):: BoNC: Bag of N-Characters Model for Word Level Language Identification (Ismail et al., ICON 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/emnlp-22-attachments/2022.icon-wlli.7.pdf

PDF Search