The RATS Collection: Supporting HLT Research with Degraded Audio Data
David Graff, Kevin Walker, Stephanie Strassel, Xiaoyi Ma, Karen Jones, Ann Sawyer
Abstract
The DARPA RATS program was established to foster development of language technology systems that can perform well on speaker-to-speaker communications over radio channels that evince a wide range in the type and extent of signal variability and acoustic degradation. Creating suitable corpora to address this need poses an equally wide range of challenges for the collection, annotation and quality assessment of relevant data. This paper describes the LDCs multi-year effort to build the RATS data collection, summarizes the content and properties of the resulting corpora, and discusses the novel problems and approaches involved in ensuring that the data would satisfy its intended use, to provide speech recordings and annotations for training and evaluating HLT systems that perform 4 specific tasks on difficult radio channels: Speech Activity Detection (SAD), Language Identification (LID), Speaker Identification (SID) and Keyword Spotting (KWS).- Anthology ID:
- L14-1089
- Volume:
- Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
- Month:
- May
- Year:
- 2014
- Address:
- Reykjavik, Iceland
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 1970–1977
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/1125_Paper.pdf
- DOI:
- Cite (ACL):
- David Graff, Kevin Walker, Stephanie Strassel, Xiaoyi Ma, Karen Jones, and Ann Sawyer. 2014. The RATS Collection: Supporting HLT Research with Degraded Audio Data. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 1970–1977, Reykjavik, Iceland. European Language Resources Association (ELRA).
- Cite (Informal):
- The RATS Collection: Supporting HLT Research with Degraded Audio Data (Graff et al., LREC 2014)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/1125_Paper.pdf