Abstract
In this paper, we describe the first Tigrinya Languages speech corpora designed and development for speech recognition purposes. Tigrinya, often written as Tigrigna (ትግርኛ) /tɪˈɡrinjə/ belongs to the Semitic branch of the Afro-Asiatic languages where it shows the characteristic features of a Semitic language. It is spoken by ethnic Tigray-Tigrigna people in the Horn of Africa. The paper outlines different corpus designing process analysis of related work on speech corpora creation for different languages. The authors provide also procedures that were used for the creation of Tigrinya speech recognition corpus which is the under-resourced language. One hundred and thirty speakers, native to Tigrinya language, were recorded for training and test dataset set. Each speaker read 100 texts, which consisted of syllabically rich and balanced sentences. Ten thousand sets of sentences were used to prompt sheets. These sentences contained all of the contextual syllables and phones.- Anthology ID:
- W18-3811
- Volume:
- Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing
- Month:
- August
- Year:
- 2018
- Address:
- Santa Fe, New Mexico, USA
- Editors:
- Peter Machonis, Anabela Barreiro, Kristina Kocijan, Max Silberztein
- Venue:
- LR4NLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 78–82
- Language:
- URL:
- https://aclanthology.org/W18-3811
- DOI:
- Cite (ACL):
- Hafte Abera and Sebsibe H/Mariam. 2018. Design of a Tigrinya Language Speech Corpus for Speech Recognition. In Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing, pages 78–82, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
- Cite (Informal):
- Design of a Tigrinya Language Speech Corpus for Speech Recognition (Abera & H/Mariam, LR4NLP 2018)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/W18-3811.pdf