Emre Yilmaz

2021

2020

pdf bib abs
Semi-supervised Development of ASR Systems for Multilingual Code-switched Speech in Under-resourced Languages
Astik Biswas | Emre Yilmaz | Febe De Wet | Ewald Van der westhuizen | Thomas Niesler
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper reports on the semi-supervised development of acoustic and language models for under-resourced, code-switched speech in five South African languages. Two approaches are considered. The first constructs four separate bilingual automatic speech recognisers (ASRs) corresponding to four different language pairs between which speakers switch frequently. The second uses a single, unified, five-lingual ASR system that represents all the languages (English, isiZulu, isiXhosa, Setswana and Sesotho). We evaluate the effectiveness of these two approaches when used to add additional data to our extremely sparse training sets. Results indicate that batch-wise semi-supervised training yields better results than a non-batch-wise approach. Furthermore, while the separate bilingual systems achieved better recognition performance than the unified system, they benefited more from pseudolabels generated by the five-lingual system than from those generated by the bilingual systems.

pdf bib abs
Semi-supervised Acoustic Modelling for Five-lingual Code-switched ASR using Automatically-segmented Soap Opera Speech
Nick Wilkinson | Astik Biswas | Emre Yilmaz | Febe De Wet | Ewald Van der westhuizen | Thomas Niesler
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

This paper considers the impact of automatic segmentation on the fully-automatic, semi-supervised training of automatic speech recog-nition (ASR) systems for five-lingual code-switched (CS) speech. Four automatic segmentation techniques were evaluated in terms ofthe recognition performance of an ASR system trained on the resulting segments in a semi-supervised manner. For comparative purposesa semi-supervised syste Three of these use a newly proposed convolutional neural network (CNN) model for framewise classification,and include a novel form of HMM smoothing of the CNN outputs. Automatic segmentation was applied in combination with automaticspeaker diarization. The best-performing segmentation technique was also evaluated without speaker diarization. An evaluation basedon 248 unsegmented soap opera episodes indicated that voice activity detection (VAD) based on a CNN followed by Gaussian mixturemodel-hidden Markov model smoothing (CNN-GMM-HMM) yields the best ASR performance. The semi-supervised system trainedwith the best automatic segmentation achieved an overall WER improvement of 1.1% absolute over a semi-supervised system trainedwith manually created segments. Furthermore, we found that recognition rates improved even further when the automatic segmentationwas used in conjunction with speaker diarization.

2016

pdf bib abs
A Dutch Dysarthric Speech Database for Individualized Speech Therapy Research
Emre Yilmaz | Mario Ganzeboom | Lilian Beijer | Catia Cucchiarini | Helmer Strik
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present a new Dutch dysarthric speech database containing utterances of neurological patients with Parkinson’s disease, traumatic brain injury and cerebrovascular accident. The speech content is phonetically and linguistically diversified by using numerous structured sentence and word lists. Containing more than 6 hours of mildly to moderately dysarthric speech, this database can be used for research on dysarthria and for developing and testing speech-to-text systems designed for medical applications. Current activities aimed at extending this database are also discussed.

We present a new speech database containing 18.5 hours of annotated radio broadcasts in the Frisian language. Frisian is mostly spoken in the province Fryslan and it is the second official language of the Netherlands. The recordings are collected from the archives of Omrop Fryslan, the regional public broadcaster of the province Fryslan. The database covers almost a 50-year time span. The native speakers of Frisian are mostly bilingual and often code-switch in daily conversations due to the extensive influence of the Dutch language. Considering the longitudinal and code-switching nature of the data, an appropriate annotation protocol has been designed and the data is manually annotated with the orthographic transcription, speaker identities, dialect information, code-switching details and background noise/music information.

2013

pdf bib
Automating speech reception threshold measurements using automatic speech recognition
Hanne Deprez | Emre Yilmaz | Stefan Lievens | Hugo Van hamme
Proceedings of the Fourth Workshop on Speech and Language Processing for Assistive Technologies

Co-authors

Venues

Fix data