Theodoros Kostoulas

2010

pdf bib abs
Vergina: A Modern Greek Speech Database for Speech Synthesis
Alexandros Lazaridis | Theodoros Kostoulas | Todor Ganchev | Iosif Mporas | Nikos Fakotakis
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The present paper outlines the Vergina speech database, which was developed in support of research and development of corpus-based unit selection and statistical parametric speech synthesis systems for Modern Greek language. In the following, we describe the design, development and implementation of the recording campaign, as well as the annotation of the database. Specifically, a text corpus of approximately 5 million words, collected from newspaper articles, periodicals, and paragraphs of literature, was processed in order to select the utterances-sentences needed for producing the speech database and to achieve a reasonable phonetic coverage. The broad coverage and contents of the selected utterances-sentences of the database ― text corpus collected from different domains and writing styles ― makes this database appropriate for various application domains. The database, recorded in audio studio, consists of approximately 3,000 phonetically balanced Modern Greek utterances corresponding to approximately four hours of speech. Annotation of the Vergina speech database was performed using task-specific tools, which are based on a hidden Markov model (HMM) segmentation method, and then manual inspection and corrections were performed.

pdf bib abs
The PlayMancer Database: A Multimodal Affect Database in Support of Research and Development Activities in Serious Game Environment
Theodoros Kostoulas | Otilia Kocsis | Todor Ganchev | Fernando Fernández-Aranda | Juan J. Santamaría | Susana Jiménez-Murcia | Maher Ben Moussa | Nadia Magnenat-Thalmann | Nikos Fakotakis
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The present paper reports on a recent effort that resulted in the establishment of a unique multimodal affect database, referred to as the PlayMancer database. This database was created in support of the research and development activities, taking place within the PlayMancer project, which aim at the development of a serious game environment in support of treatment of patients with behavioural and addictive disorders, such as eating disorders and gambling addictions. Specifically, for the purpose of data collection, we designed and implemented a pilot trial with healthy test subjects. Speech, video and bio-signals (pulse-rate, SpO2) were captured synchronously, during the interaction of healthy people with a number of video games. The collected data were annotated by the test subjects (self-annotation), targeting proper interpretation of the underlying affective states. The broad-shouldered design of the PlayMancer database allows its use for the needs of research on multimodal affect-emotion recognition and multimodal human-computer interaction in serious games environment.

2008

pdf bib abs
A Real-World Emotional Speech Corpus for Modern Greek
Theodoros Kostoulas | Todor Ganchev | Iosif Mporas | Nikos Fakotakis
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The present paper deals with the design and the annotation of a Greek real-world emotional speech corpus. The speech data consist of recordings collected during the interaction of naïve users with a smart-home dialogue system. Annotation of the speech data with respect to the uttered command and emotional state was performed. Initial experimentations towards recognizing negative emotional states were performed and the experimental results indicate the range of difficulties when dealing with real-world data.

A speech and noise corpus dealing with the extreme conditions of the motorcycle environment is developed within the MoveOn project. Speech utterances in British English are recorded and processed approaching the issue of command and control and template driven dialog systems on the motorcycle. The major part of the corpus comprises noisy speech and environmental noise recorded on a motorcycle, but several clean speech recordings in a silent environment are also available. The corpus development focuses on distortion free recordings and accurate descriptions of both recorded speech and noise. Not only speech segments are annotated but also annotation of environmental noise is performed. The corpus is a small-sized speech corpus with about 12 hours of clean and noisy speech utterances and about 30 hours of segments with environmental noise without speech. This paper addresses the motivation and development of the speech corpus and finally presents some statistics and results of the database creation.