2025
pdf
bib
abs
Text-to-speech system for low-resource languages: A case study in Shipibo-Konibo (a Panoan language from Peru)
Daniel Menendez
|
Hector Gomez
Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)
This paper presents the design and development of a Text-to-Speech (TTS) model for Shipibo-Konibo, a low-resource indigenous language spoken mainly in the Peruvian Amazon. Despite the challenge posed by the scarcity of data, the model was trained with over 4 hours of recordings and 3,025 meticulously collected written sentences. The tests results demon strated an intelligibility rate (IR) exceeding 88% and a mean opinion score (MOS) of 4.01, confirming the quality of the audio generated by the model, which comprises the Tacotron 2 spectrogram predictor and the HiFi-GAN vocoder. Furthermore, the potential of this model to be trained in other indigenous languages spoken in Peru is highlighted, opening a promising avenue for the documentation and revitalization of these languages.
2022
pdf
bib
abs
WordNet-QU: Development of a Lexical Database for Quechua Varieties
Nelsi Melgarejo
|
Rodolfo Zevallos
|
Hector Gomez
|
John E. Ortega
Proceedings of the 29th International Conference on Computational Linguistics
In the effort to minimize the risk of extinction of a language, linguistic resources are fundamental. Quechua, a low-resource language from South America, is a language spoken by millions but, despite several efforts in the past, still lacks the resources necessary to build high-performance computational systems. In this article, we present WordNet-QU which signifies the inclusion of Quechua in a well-known lexical database called wordnet. We propose WordNet-QU to be included as an extension to wordnet after demonstrating a manually-curated collection of multiple digital resources for lexical use in Quechua. Our work uses the synset alignment algorithm to compare Quechua to its geographically nearest high-resource language, Spanish. Altogether, we propose a total of 28,582 unique synset IDs divided according to region like so: 20510 for Southern Quechua, 5993 for Central Quechua, 1121 for Northern Quechua, and 958 for Amazonian Quechua.
2018
pdf
bib
Corpus Building and Evaluation of Aspect-based Opinion Summaries from Tweets in Spanish
Daniel Peñaloza
|
Rodrigo López
|
Juanjosé Tenorio
|
Héctor Gómez
|
Arturo Oncevay-Marcos
|
Marco A. Sobrevilla Cabezudo
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)