Unraveling Spontaneous Speech Dimensions for Cross-Corpus ASR System Evaluation for French

Solene Virginie Evain, Solange Rossato, François Portet


Abstract
Many papers on speech processing use the term ‘spontaneous speech’ as a catch-all term for situations like speaking with a friend, being interviewed on radio/TV or giving a lecture. However, Automatic Speech Recognition (ASR) systems performance seems to exhibit variation on this type of speech: the more spontaneous the speech, the higher the WER (Word Error Rate). Our study focuses on better understanding the elements influencing the levels of spontaneity in order to evaluate the relation between categories of spontaneity and ASR systems performance and improve the recognition on those categories. We first analyzed the literature, listed and unraveled those elements, and finally identified four axes: the situation of communication, the level of intimacy between speakers, the channel and the type of communication. Then, we trained ASR systems and measured the impact of instances of face-to-face interaction labeled with the previous dimensions (different levels of spontaneity) on WER. We made two axes vary and found that both dimensions have an impact on the WER. The situation of communication seems to have the biggest impact on spontaneity: ASR systems give better results for situations like an interview than for friends having a conversation at home.
Anthology ID:
2024.lrec-main.1491
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
17165–17175
Language:
URL:
https://aclanthology.org/2024.lrec-main.1491
DOI:
Bibkey:
Cite (ACL):
Solene Virginie Evain, Solange Rossato, and François Portet. 2024. Unraveling Spontaneous Speech Dimensions for Cross-Corpus ASR System Evaluation for French. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 17165–17175, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Unraveling Spontaneous Speech Dimensions for Cross-Corpus ASR System Evaluation for French (Evain et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/add_acl24_videos/2024.lrec-main.1491.pdf