Language Identification of Philippine Creole Spanish: Discriminating Chavacano From Related Languages

Aileen Joan Vicente, Charibeth Cheng


Abstract
Chavacano is a Spanish Creole widely spoken in the southern regions of the Philippines. It is one of the many Philippine languages yet to be studied computationally. This paper presents the development of a language identification model of Chavacano to distinguish it from languages that influence its creolization using character convolutional networks. Unlike studies that discriminated similar languages based on geographical proximity, this paper reports a similarity focused on the creolization of a language. We established the similarity of Chavacano and its related languages, Spanish, Portuguese, Cebuano, and Hiligaynon, from the number of common words in the corpus for all languages. We report an accuracy of 93% for the model generated using ten filters with a filter width of 5. The training experiments reveal that increasing the filter width, number of filters, or training epochs is unnecessary even if the accuracy increases because the generated models present irregular learning behavior or may have already been overfitted. This study also demonstrates that the character features extracted from convolutional neural networks, similar to n-grams, are sufficient in identifying Chavacano. Future work on the language identification of Chavacano includes improving classification accuracy for short or code-switched texts for practical applications such as social media sensors for disaster response and management.
Anthology ID:
2024.vardial-1.16
Volume:
Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Yves Scherrer, Tommi Jauhiainen, Nikola Ljubešić, Marcos Zampieri, Preslav Nakov, Jörg Tiedemann
Venues:
VarDial | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
188–196
Language:
URL:
https://aclanthology.org/2024.vardial-1.16
DOI:
Bibkey:
Cite (ACL):
Aileen Joan Vicente and Charibeth Cheng. 2024. Language Identification of Philippine Creole Spanish: Discriminating Chavacano From Related Languages. In Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024), pages 188–196, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Language Identification of Philippine Creole Spanish: Discriminating Chavacano From Related Languages (Vicente & Cheng, VarDial-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.vardial-1.16.pdf
Supplementary material:
 2024.vardial-1.16.SupplementaryMaterial.txt