LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild

David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos


Abstract
Speech is considered as a multi-modal process where hearing and vision are two fundamentals pillars. In fact, several studies have demonstrated that the robustness of Automatic Speech Recognition systems can be improved when audio and visual cues are combined to represent the nature of speech. In addition, Visual Speech Recognition, an open research problem whose purpose is to interpret speech by reading the lips of the speaker, has been a focus of interest in the last decades. Nevertheless, in order to estimate these systems in the currently Deep Learning era, large-scale databases are required. On the other hand, while most of these databases are dedicated to English, other languages lack sufficient resources. Thus, this paper presents a semi-automatically annotated audiovisual database to deal with unconstrained natural Spanish, providing 13 hours of data extracted from Spanish television. Furthermore, baseline results for both speaker-dependent and speaker-independent scenarios are reported using Hidden Markov Models, a traditional paradigm that has been widely used in the field of Speech Technologies.
Anthology ID:
2022.lrec-1.294
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
2750–2758
Language:
URL:
https://aclanthology.org/2022.lrec-1.294
DOI:
Bibkey:
Cite (ACL):
David Gimeno-Gómez and Carlos-D. Martínez-Hinarejos. 2022. LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 2750–2758, Marseille, France. European Language Resources Association.
Cite (Informal):
LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild (Gimeno-Gómez & Martínez-Hinarejos, LREC 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2022.lrec-1.294.pdf
Code
 david-gimeno/lip-rtve