Sarah Fünfer


2016

pdf
Evaluation of the KIT Lecture Translation System
Markus Müller | Sarah Fünfer | Sebastian Stüker | Alex Waibel
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

To attract foreign students is among the goals of the Karlsruhe Institute of Technology (KIT). One obstacle to achieving this goal is that lectures at KIT are usually held in German which many foreign students are not sufficiently proficient in, as, e.g., opposed to English. While the students from abroad are learning German during their stay at KIT, it is challenging to become proficient enough in it in order to follow a lecture. As a solution to this problem we offer our automatic simultaneous lecture translation. It translates German lectures into English in real time. While not as good as human interpreters, the system is available at a price that KIT can afford in order to offer it in potentially all lectures. In order to assess whether the quality of the system we have conducted a user study. In this paper we present this study, the way it was conducted and its results. The results indicate that the quality of the system has passed a threshold as to be able to support students in their studies. The study has helped to identify the most crucial weaknesses of the systems and has guided us which steps to take next.

2014

pdf
A Corpus of Spontaneous Speech in Lectures: The KIT Lecture Corpus for Spoken Language Processing and Translation
Eunah Cho | Sarah Fünfer | Sebastian Stüker | Alex Waibel
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

With the increasing number of applications handling spontaneous speech, the needs to process spoken languages become stronger. Speech disfluency is one of the most challenging tasks to deal with in automatic speech processing. As most applications are trained with well-formed, written texts, many issues arise when processing spontaneous speech due to its distinctive characteristics. Therefore, more data with annotated speech disfluencies will help the adaptation of natural language processing applications, such as machine translation systems. In order to support this, we have annotated speech disfluencies in German lectures at KIT. In this paper we describe how we annotated the disfluencies in the data and provide detailed statistics on the size of the corpus and the speakers. Moreover, machine translation performance on a source text including disfluencies is compared to the results of the translation of a source text without different sorts of disfluencies or no disfluencies at all.