Discriminating between Indo-Aryan Languages Using SVM Ensembles
Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi, Santanu Pal, Liviu P. Dinu
Abstract
In this paper we present a system based on SVM ensembles trained on characters and words to discriminate between five similar languages of the Indo-Aryan family: Hindi, Braj Bhasha, Awadhi, Bhojpuri, and Magahi. The system competed in the Indo-Aryan Language Identification (ILI) shared task organized within the VarDial Evaluation Campaign 2018. Our best entry in the competition, named ILIdentification, scored 88.95% F1 score and it was ranked 3rd out of 8 teams.- Anthology ID:
- W18-3920
- Volume:
- Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)
- Month:
- August
- Year:
- 2018
- Address:
- Santa Fe, New Mexico, USA
- Editors:
- Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi, Ahmed Ali
- Venue:
- VarDial
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 178–184
- Language:
- URL:
- https://aclanthology.org/W18-3920
- DOI:
- Cite (ACL):
- Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi, Santanu Pal, and Liviu P. Dinu. 2018. Discriminating between Indo-Aryan Languages Using SVM Ensembles. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), pages 178–184, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
- Cite (Informal):
- Discriminating between Indo-Aryan Languages Using SVM Ensembles (Ciobanu et al., VarDial 2018)
- PDF:
- https://preview.aclanthology.org/naacl24-info/W18-3920.pdf