Christian Huber


2021

pdf
KIT’s IWSLT 2021 Offline Speech Translation System
Tuan Nam Nguyen | Thai Son Nguyen | Christian Huber | Ngoc-Quan Pham | Thanh-Le Ha | Felix Schneider | Sebastian Stüker
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes KIT’submission to the IWSLT 2021 Offline Speech Translation Task. We describe a system in both cascaded condition and end-to-end condition. In the cascaded condition, we investigated different end-to-end architectures for the speech recognition module. For the text segmentation module, we trained a small transformer-based model on high-quality monolingual data. For the translation module, our last year’s neural machine translation model was reused. In the end-to-end condition, we improved our Speech Relative Transformer architecture to reach or even surpass the result of the cascade system.

2020

pdf
Supervised Adaptation of Sequence-to-Sequence Speech Recognition Systems using Batch-Weighting
Christian Huber | Juan Hussain | Tuan-Nam Nguyen | Kaihang Song | Sebastian Stüker | Alexander Waibel
Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems

When training speech recognition systems, one often faces the situation that sufficient amounts of training data for the language in question are available but only small amounts of data for the domain in question. This problem is even bigger for end-to-end speech recognition systems that only accept transcribed speech as training data, which is harder and more expensive to obtain than text data. In this paper we present experiments in adapting end-to-end speech recognition systems by a method which is called batch-weighting and which we contrast against regular fine-tuning, i.e., to continue to train existing neural speech recognition models on adaptation data. We perform experiments using theses techniques in adapting to topic, accent and vocabulary, showing that batch-weighting consistently outperforms fine-tuning. In order to show the generalization capabilities of batch-weighting we perform experiments in several languages, i.e., Arabic, English and German. Due to its relatively small computational requirements batch-weighting is a suitable technique for supervised life-long learning during the life-time of a speech recognition system, e.g., from user corrections.