BoNC: Bag of N-Characters Model for Word Level Language Identification

Shimaa Ismail, Mai K. Gallab, Hamada Nayel


Abstract
This paper describes the model submitted by NLP_BFCAI team for Kanglish shared task held at ICON 2022. The proposed model used a very simple approach based on the word representation. Simple machine learning classification algorithms, Random Forests, Support Vector Machines, Stochastic Gradient Descent and Multi-Layer Perceptron have been imple- mented. Our submission, RF, securely ranked fifth among all other submissions.
Anthology ID:
2022.icon-wlli.7
Volume:
Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts
Month:
December
Year:
2022
Address:
IIIT Delhi, New Delhi, India
Editors:
Bharathi Raja Chakravarthi, Abirami Murugappan, Dhivya Chinnappa, Adeep Hane, Prasanna Kumar Kumeresan, Rahul Ponnusamy
Venue:
ICON
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
34–37
Language:
URL:
https://aclanthology.org/2022.icon-wlli.7
DOI:
Bibkey:
Cite (ACL):
Shimaa Ismail, Mai K. Gallab, and Hamada Nayel. 2022. BoNC: Bag of N-Characters Model for Word Level Language Identification. In Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts, pages 34–37, IIIT Delhi, New Delhi, India. Association for Computational Linguistics.
Cite (Informal):
BoNC: Bag of N-Characters Model for Word Level Language Identification (Ismail et al., ICON 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2022.icon-wlli.7.pdf