Sabina Oporto


2024

Bilingual dictionaries present several challenges, especially for sign languages and oral languages, where multimodality plays a role. We deployed and tested the first bilingual Peruvian Sign Language (LSP) - Spanish Online Dictionary. The first feature allows the user to introduce a text and receive as a result a list of videos whose glosses are related to the input text or Spanish word. The second feature allows the user to sign in front of the camera and shows the five most probable Spanish translations based on the similarity between the input sign and gloss-labeled sign videos used to train a machine learning model. These features are constructed in a design and architecture that differentiates among the coincidence for the Spanish text searched, the sign gloss, and Spanish translation. We explain in depth how these concepts or database columns impact the search. Similarly, we share the challenges of deploying a real-world machine learning model for isolated sign language recognition through Amazon Web Services (AWS).

2022

In this paper, we launch a new Universal Dependencies treebank for an endangered language from Amazonia: Kakataibo, a Panoan language spoken in Peru. We first discuss the collaborative methodology implemented, which proved effective to create a treebank in the context of a Computational Linguistic course for undergraduates. Then, we describe the general details of the treebank and the language-specific considerations implemented for the proposed annotation. We finally conduct some experiments on part-of-speech tagging and syntactic dependency parsing. We focus on monolingual and transfer learning settings, where we study the impact of a Shipibo-Konibo treebank, another Panoan language resource.