Extracting Linguistic Knowledge from Speech: A Study of Stop Realization in 5 Romance Languages

Yaru Wu, Mathilde Hutin, Ioana Vasilescu, Lori Lamel, Martine Adda-Decker


Abstract
This paper builds upon recent work in leveraging the corpora and tools originally used to develop speech technologies for corpus-based linguistic studies. We address the non-canonical realization of consonants in connected speech and we focus on voicing alternation phenomena of stops in 5 standard varieties of Romance languages (French, Italian, Spanish, Portuguese, Romanian). For these languages, both large scale corpora and speech recognition systems were available for the study. We use forced alignment with pronunciation variants and machine learning techniques to examine to what extent such frequent phenomena characterize languages and what are the most triggering factors. The results confirm that voicing alternations occur in all Romance languages. Automatic classification underlines that surrounding contexts and segment duration are recurring contributing factors for modeling voicing alternation. The results of this study also demonstrate the new role that machine learning techniques such as classification algorithms can play in helping to extract linguistic knowledge from speech and to suggest interesting research directions.
Anthology ID:
2022.lrec-1.348
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
3257–3263
Language:
URL:
https://aclanthology.org/2022.lrec-1.348
DOI:
Bibkey:
Cite (ACL):
Yaru Wu, Mathilde Hutin, Ioana Vasilescu, Lori Lamel, and Martine Adda-Decker. 2022. Extracting Linguistic Knowledge from Speech: A Study of Stop Realization in 5 Romance Languages. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3257–3263, Marseille, France. European Language Resources Association.
Cite (Informal):
Extracting Linguistic Knowledge from Speech: A Study of Stop Realization in 5 Romance Languages (Wu et al., LREC 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/2022.lrec-1.348.pdf