How Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News

Imran Sheikh; Irina Illina; Dominique Fohr

How Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News

Imran Sheikh, Irina Illina, Dominique Fohr

Abstract

Out-Of-Vocabulary (OOV) words missed by Large Vocabulary Continuous Speech Recognition (LVCSR) systems can be recovered with the help of topic and semantic context of the OOV words captured from a diachronic text corpus. In this paper we investigate how the choice of documents for the diachronic text corpora affects the retrieval of OOV Proper Names (PNs) relevant to an audio document. We first present our diachronic French broadcast news datasets, which highlight the motivation of our study on OOV PNs. Then the effect of using diachronic text data from different sources and a different time span is analysed. With OOV PN retrieval experiments on French broadcast news videos, we conclude that a diachronic corpus with text from different sources leads to better retrieval performance than one relying on text from single source or from a longer time span.

Anthology ID:: L16-1609
Volume:: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:: May
Year:: 2016
Address:: Portorož, Slovenia
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 3851–3855
Language:
URL:: https://preview.aclanthology.org/landing_page/L16-1609/
DOI:
Bibkey:
Cite (ACL):: Imran Sheikh, Irina Illina, and Dominique Fohr. 2016. How Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3851–3855, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):: How Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News (Sheikh et al., LREC 2016)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/L16-1609.pdf

PDF Cite Search Fix data