Patrick Bauer


2010

pdf
WTIMIT: The TIMIT Speech Corpus Transmitted Over The 3G AMR Wideband Mobile Network
Patrick Bauer | David Scheler | Tim Fingscheidt
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In anticipation of upcoming mobile telephony services with higher speech quality, a wideband (50 Hz to 7 kHz) mobile telephony derivative of TIMIT has been recorded called WTIMIT. It opens up various scientific investigations; e.g., on speech quality and intelligibility, as well as on wideband upgrades of network-side interactive voice response (IVR) systems with retrained or bandwidth-extended acoustic models for automatic speech recognition (ASR). Wideband telephony could enable network-side speech recognition applications such as remote dictation or spelling without the need of distributed speech recognition techniques. The WTIMIT corpus was transmitted via two prepared Nokia 6220 mobile phones over T-Mobile's 3G wideband mobile network in The Hague, The Netherlands, employing the Adaptive Multirate Wideband (AMR-WB) speech codec. The paper presents observations of transmission effects and phoneme recognition experiments. It turns out that in the case of wideband telephony, server-side ASR should not be carried out by simply decimating received signals to 8 kHz and applying existent narrowband acoustic models. Nor do we recommend just simulating the AMR-WB codec for training of wideband acoustic models. Instead, real-world wideband telephony channel data (such as WTIMIT) provides the best training material for wideband IVR systems.