BoNC: Bag of N-Characters Model for Word Level Language Identification
Abstract
This paper describes the model submitted by NLP_BFCAI team for Kanglish shared task held at ICON 2022. The proposed model used a very simple approach based on the word representation. Simple machine learning classification algorithms, Random Forests, Support Vector Machines, Stochastic Gradient Descent and Multi-Layer Perceptron have been imple- mented. Our submission, RF, securely ranked fifth among all other submissions.- Anthology ID:
- 2022.icon-wlli.7
- Volume:
- Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts
- Month:
- December
- Year:
- 2022
- Address:
- IIIT Delhi, New Delhi, India
- Editors:
- Bharathi Raja Chakravarthi, Abirami Murugappan, Dhivya Chinnappa, Adeep Hane, Prasanna Kumar Kumeresan, Rahul Ponnusamy
- Venue:
- ICON
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 34–37
- Language:
- URL:
- https://aclanthology.org/2022.icon-wlli.7
- DOI:
- Cite (ACL):
- Shimaa Ismail, Mai K. Gallab, and Hamada Nayel. 2022. BoNC: Bag of N-Characters Model for Word Level Language Identification. In Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts, pages 34–37, IIIT Delhi, New Delhi, India. Association for Computational Linguistics.
- Cite (Informal):
- BoNC: Bag of N-Characters Model for Word Level Language Identification (Ismail et al., ICON 2022)
- PDF:
- https://preview.aclanthology.org/teach-a-man-to-fish/2022.icon-wlli.7.pdf