CoRuSS - a New Prosodically Annotated Corpus of Russian Spontaneous Speech

Tatiana Kachkovskaia, Daniil Kocharov, Pavel Skrelin, Nina Volskaya


Abstract
This paper describes speech data recording, processing and annotation of a new speech corpus CoRuSS (Corpus of Russian Spontaneous Speech), which is based on connected communicative speech recorded from 60 native Russian male and female speakers of different age groups (from 16 to 77). Some Russian speech corpora available at the moment contain plain orthographic texts and provide some kind of limited annotation, but there are no corpora providing detailed prosodic annotation of spontaneous conversational speech. This corpus contains 30 hours of high quality recorded spontaneous Russian speech, half of it has been transcribed and prosodically labeled. The recordings consist of dialogues between two speakers, monologues (speakers’ self-presentations) and reading of a short phonetically balanced text. Since the corpus is labeled for a wide range of linguistic - phonetic and prosodic - information, it provides basis for empirical studies of various spontaneous speech phenomena as well as for comparison with those we observe in prepared read speech. Since the corpus is designed as a open-access resource of speech data, it will also make possible to advance corpus-based analysis of spontaneous speech data across languages and speech technology development as well.
Anthology ID:
L16-1309
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1949–1954
Language:
URL:
https://aclanthology.org/L16-1309
DOI:
Bibkey:
Cite (ACL):
Tatiana Kachkovskaia, Daniil Kocharov, Pavel Skrelin, and Nina Volskaya. 2016. CoRuSS - a New Prosodically Annotated Corpus of Russian Spontaneous Speech. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1949–1954, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
CoRuSS - a New Prosodically Annotated Corpus of Russian Spontaneous Speech (Kachkovskaia et al., LREC 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/L16-1309.pdf