2013
pdf
abs
The 2013 KIT IWSLT speech-to-text systems for German and English
Kevin Kilgour
|
Christian Mohr
|
Michael Heck
|
Quoc Bao Nguyen
|
Van Huy Nguyen
|
Evgeniy Shin
|
Igor Tseyzer
|
Jonas Gehring
|
Markus Müller
|
Matthias Sperber
|
Sebastian Stüker
|
Alex Waibel
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper describes our English Speech-to-Text (STT) systems for the 2013 IWSLT TED ASR track. The systems consist of multiple subsystems that are combinations of different front-ends, e.g. MVDR-MFCC based and lMel based ones, GMM and NN acoustic models and different phone sets. The outputs of the subsystems are combined via confusion network combination. Decoding is done in two stages, where the systems of the second stage are adapted in an unsupervised manner on the combination of the first stage outputs using VTLN, MLLR, and cMLLR.
2012
pdf
abs
The KIT Lecture Corpus for Speech Translation
Sebastian Stüker
|
Florian Kraft
|
Christian Mohr
|
Teresa Herrmann
|
Eunah Cho
|
Alex Waibel
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Academic lectures offer valuable content, but often do not reach their full potential audience due to the language barrier. Human translations of lectures are too expensive to be widely used. Speech translation technology can be an affordable alternative in this case. State-of-the-art speech translation systems utilize statistical models that need to be trained on large amounts of in-domain data. In order to support the KIT lecture translation project in its effort to introduce speech translation technology in KIT's lecture halls, we have collected a corpus of German lectures at KIT. In this paper we describe how we recorded the lectures and how we annotated them. We further give detailed statistics on the types of lectures in the corpus and its size. We collected the corpus with the purpose in mind that it should not just be suited for training a spoken language translation system the traditional way, but should also enable us to research techniques that enable the translation system to automatically and autonomously adapt itself to the varying topics and speakers of lectures
pdf
abs
The 2012 KIT and KIT-NAIST English ASR systems for the IWSLT evaluation
Christian Saam
|
Christian Mohr
|
Kevin Kilgour
|
Michael Heck
|
Matthias Sperber
|
Keigo Kubo
|
Sebatian Stüker
|
Sakriani Sakri
|
Graham Neubig
|
Tomoki Toda
|
Satoshi Nakamura
|
Alex Waibel
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper describes our English Speech-to-Text (STT) systems for the 2012 IWSLT TED ASR track evaluation. The systems consist of 10 subsystems that are combinations of different front-ends, e.g. MVDR based and MFCC based ones, and two different phone sets. The outputs of the subsystems are combined via confusion network combination. Decoding is done in two stages, where the systems of the second stage are adapted in an unsupervised manner on the combination of the first stage outputs using VTLN, MLLR, and cM-LLR.
pdf
abs
The KIT-NAIST (contrastive) English ASR system for IWSLT 2012
Michael Heck
|
Keigo Kubo
|
Matthias Sperber
|
Sakriani Sakti
|
Sebastian Stüker
|
Christian Saam
|
Kevin Kilgour
|
Christian Mohr
|
Graham Neubig
|
Tomoki Toda
|
Satoshi Nakamura
|
Alex Waibel
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper describes the KIT-NAIST (Contrastive) English speech recognition system for the IWSLT 2012 Evaluation Campaign. In particular, we participated in the ASR track of the IWSLT TED task. The system was developed by Karlsruhe Institute of Technology (KIT) and Nara Institute of Science and Technology (NAIST) teams in collaboration within the interACT project. We employ single system decoding with fully continuous and semi-continuous models, as well as a three-stage, multipass system combination framework built with the Janus Recognition Toolkit. On the IWSLT 2010 test set our single system introduced in this work achieves a WER of 17.6%, and our final combination achieves a WER of 14.4%.
2011
pdf
bib
abs
The 2011 KIT QUAERO speech-to-text system for Spanish
Kevin Kilgour
|
Christian Saam
|
Christian Mohr
|
Sebastian Stüker
|
Alex Waibel
Proceedings of the 8th International Workshop on Spoken Language Translation: Papers
This paper describes our current Spanish speech-to-text (STT) system with which we participated in the 2011 Quaero STT evaluation that is being developed within the Quaero program. The system consists of 4 separate subsystems, as well as the standard MFCC and MVDR phoneme based subsystems we included a both a phoneme and grapheme based bottleneck subsystem. We carefully evaluate the performance of each subsystem. After including several new techniques we were able to reduce the WER by over 30% from 20.79% to 14.53%.