DECODA: a call-centre human-human spoken conversation corpus

Frederic Bechet, Benjamin Maza, Nicolas Bigouroux, Thierry Bazillon, Marc El-Bèze, Renato De Mori, Eric Arbillot


Abstract
The goal of the DECODA project is to reduce the development cost of Speech Analytics systems by reducing the need for manual annotat ion. This project aims to propose robust speech data mining tools in the framework of call-center monitoring and evaluation, by means of weakl y supervised methods. The applicative framework of the project is the call-center of the RATP (Paris public transport authority). This project tackles two very important open issues in the development of speech mining methods from spontaneous speech recorded in call-centers : robus tness (how to extract relevant information from very noisy and spontaneous speech messages) and weak supervision (how to reduce the annotation effort needed to train and adapt recognition and classification models). This paper describes the DECODA corpus collected at the RATP during the project. We present the different annotation levels performed on the corpus, the methods used to obtain them, as well as some evaluation o f the quality of the annotations produced.
Anthology ID:
L12-1399
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1343–1347
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/684_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Frederic Bechet, Benjamin Maza, Nicolas Bigouroux, Thierry Bazillon, Marc El-Bèze, Renato De Mori, and Eric Arbillot. 2012. DECODA: a call-centre human-human spoken conversation corpus. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1343–1347, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
DECODA: a call-centre human-human spoken conversation corpus (Bechet et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/684_Paper.pdf