Frederic Blum


2022

pdf
Building an Endangered Language Resource in the Classroom: Universal Dependencies for Kakataibo
Roberto Zariquiey | Claudia Alvarado | Ximena Echevarría | Luisa Gomez | Rosa Gonzales | Mariana Illescas | Sabina Oporto | Frederic Blum | Arturo Oncevay | Javier Vera
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper, we launch a new Universal Dependencies treebank for an endangered language from Amazonia: Kakataibo, a Panoan language spoken in Peru. We first discuss the collaborative methodology implemented, which proved effective to create a treebank in the context of a Computational Linguistic course for undergraduates. Then, we describe the general details of the treebank and the language-specific considerations implemented for the proposed annotation. We finally conduct some experiments on part-of-speech tagging and syntactic dependency parsing. We focus on monolingual and transfer learning settings, where we study the impact of a Shipibo-Konibo treebank, another Panoan language resource.

pdf
Evaluating zero-shot transfers and multilingual models for dependency parsing and POS tagging within the low-resource language family Tupían
Frederic Blum
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

This work presents two experiments with the goal of replicating the transferability of dependency parsers and POS taggers trained on closely related languages within the low-resource language family Tupían. The experiments include both zero-shot settings as well as multilingual models. Previous studies have found that even a comparably small treebank from a closely related language will improve sequence labelling considerably in such cases. Results from both POS tagging and dependency parsing confirm previous evidence that the closer the phylogenetic relation between two languages, the better the predictions for sequence labelling tasks get. In many cases, the results are improved if multiple languages from the same family are combined. This suggests that in addition to leveraging similarity between two related languages, the incorporation of multiple languages of the same family might lead to better results in transfer learning for NLP applications.