2024
pdf
bib
Native language Identification for Arabic Language Learners using Pre-trained Language Models
Mohamed Amine Cheragui
|
Mourad Abbas
|
Mohammed Mediani
Proceedings of the 7th International Conference on Natural Language and Speech Processing (ICNLSP 2024)
2020
pdf
bib
abs
German-Arabic Speech-to-Speech Translation for Psychiatric Diagnosis
Juan Hussain
|
Mohammed Mediani
|
Moritz Behr
|
M. Amin Cheragui
|
Sebastian Stüker
|
Alexander Waibel
Proceedings of the Fifth Arabic Natural Language Processing Workshop
In this paper we present the natural language processing components of our German-Arabic speech-to-speech translation system which is being deployed in the context of interpretation during psychiatric, diagnostic interviews. For this purpose we have built a pipe-lined speech-to-speech translation system consisting of automatic speech recognition, text post-processing/segmentation, machine translation and speech synthesis systems. We have implemented two pipe-lines, from German to Arabic and Arabic to German, in order to be able to conduct interpreted two-way dialogues between psychiatrists and potential patients. All systems in our pipeline have been realized as all-neural end-to-end systems, using different architectures suitable for the different components. The speech recognition systems use an encoder/decoder + attention architecture, the text segmentation component and the machine translation system are based on the Transformer architecture, and for the speech synthesis systems we use Tacotron 2 for generating spectrograms and WaveGlow as vocoder. The speech translation is deployed in a server-based speech translation application that implements a turn based translation between a German speaking psychiatrist administrating the Mini-International Neuropsychiatric Interview (M.I.N.I.) and an Arabic speaking person answering the interview. As this is a very specific domain, in addition to the linguistic challenges posed by translating between Arabic and German, we also focus in this paper on the methods we implemented for adapting our speech translation system to the domain of this psychiatric interview.
2016
pdf
Lecture Translator - Speech translation framework for simultaneous lecture translation
Markus Müller
|
Thai Son Nguyen
|
Jan Niehues
|
Eunah Cho
|
Bastian Krüger
|
Thanh-Le Ha
|
Kevin Kilgour
|
Matthias Sperber
|
Mohammed Mediani
|
Sebastian Stüker
|
Alex Waibel
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
pdf
The Karlsruhe Institute of Technology Systems for the News Translation Task in WMT 2016
Thanh-Le Ha
|
Eunah Cho
|
Jan Niehues
|
Mohammed Mediani
|
Matthias Sperber
|
Alexandre Allauzen
|
Alexander Waibel
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
pdf
abs
Adaptation and Combination of NMT Systems: The KIT Translation Systems for IWSLT 2016
Eunah Cho
|
Jan Niehues
|
Thanh-Le Ha
|
Matthias Sperber
|
Mohammed Mediani
|
Alex Waibel
Proceedings of the 13th International Conference on Spoken Language Translation
In this paper, we present the KIT systems of the IWSLT 2016 machine translation evaluation. We participated in the machine translation (MT) task as well as the spoken language language translation (SLT) track for English→German and German→English translation. We use attentional neural machine translation (NMT) for all our submissions. We investigated different methods to adapt the system using small in-domain data as well as methods to train the system on these small corpora. In addition, we investigated methods to combine NMT systems that encode the input as well as the output differently. We combine systems using different vocabularies, reverse translation systems, multi-source translation system. In addition, we used pre-translation systems that facilitate phrase-based machine translation systems. Results show that applying domain adaptation and ensemble technique brings a crucial improvement of 3-4 BLEU points over the baseline system. In addition, system combination using n-best lists yields further 1-2 BLEU points.
2015
pdf
The Karlsruhe Institute of Technology Translation Systems for the WMT 2015
Eunah Cho
|
Thanh-Le Ha
|
Jan Niehues
|
Teresa Herrmann
|
Mohammed Mediani
|
Yuqi Zhang
|
Alex Waibel
Proceedings of the Tenth Workshop on Statistical Machine Translation
pdf
The KIT translation systems for IWSLT 2015
Thanh-Le Ha
|
Jan Niehues
|
Eunah Cho
|
Mohammed Mediani
|
Alex Waibel
Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign
2014
pdf
abs
Combined spoken language translation
Markus Freitag
|
Joern Wuebker
|
Stephan Peitz
|
Hermann Ney
|
Matthias Huck
|
Alexandra Birch
|
Nadir Durrani
|
Philipp Koehn
|
Mohammed Mediani
|
Isabel Slawik
|
Jan Niehues
|
Eunach Cho
|
Alex Waibel
|
Nicola Bertoldi
|
Mauro Cettolo
|
Marcello Federico
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign
EU-BRIDGE is a European research project which is aimed at developing innovative speech translation technology. One of the collaborative efforts within EU-BRIDGE is to produce joint submissions of up to four different partners to the evaluation campaign at the 2014 International Workshop on Spoken Language Translation (IWSLT). We submitted combined translations to the German→English spoken language translation (SLT) track as well as to the German→English, English→German and English→French machine translation (MT) tracks. In this paper, we present the techniques which were applied by the different individual translation systems of RWTH Aachen University, the University of Edinburgh, Karlsruhe Institute of Technology, and Fondazione Bruno Kessler. We then show the combination approach developed at RWTH Aachen University which combined the individual systems. The consensus translations yield empirical gains of up to 2.3 points in BLEU and 1.2 points in TER compared to the best individual system.
pdf
abs
The KIT translation systems for IWSLT 2014
Isabel Slawik
|
Mohammed Mediani
|
Jan Niehues
|
Yuqi Zhang
|
Eunah Cho
|
Teresa Herrmann
|
Thanh-Le Ha
|
Alex Waibel
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign
In this paper, we present the KIT systems participating in the TED translation tasks of the IWSLT 2014 machine translation evaluation. We submitted phrase-based translation systems for all three official directions, namely English→German, German→English, and English→French, as well as for the optional directions English→Chinese and English→Arabic. For the official directions we built systems both for the machine translation as well as the spoken language translation track. This year we improved our systems’ performance over last year through n-best list rescoring using neural network-based translation and language models and novel preordering rules based on tree information of multiple syntactic levels. Furthermore, we could successfully apply a novel phrase extraction algorithm and transliteration of unknown words for Arabic. We also submitted a contrastive system for German→English built with stemmed German adjectives. For the SLT tracks, we used a monolingual translation system to translate the lowercased ASR hypotheses with all punctuation stripped to truecased, punctuated output as a preprocessing step to our usual translation system.
pdf
abs
Improving in-domain data selection for small in-domain sets
Mohammed Mediani
|
Joshua Winebarger
|
Alexander Waibel
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers
Finding sufficient in-domain text data for language modeling is a recurrent challenge. Some methods have already been proposed for selecting parts of out-of-domain text data most closely resembling the in-domain data using a small amount of the latter. Including this new “near-domain” data in training can potentially lead to better language model performance, while reducing training resources relative to incorporating all data. One popular, state-of-the-art selection process based on cross-entropy scores makes use of in-domain and out-ofdomain language models. In order to compensate for the limited availability of the in-domain data required for this method, we introduce enhancements to two of its steps. Firstly, we improve the procedure for drawing the outof-domain sample data used for selection. Secondly, we use word-associations in order to extend the underlying vocabulary of the sample language models used for scoring. These enhancements are applied to selecting text for language modeling of talks given in a technical subject area. Besides comparing perplexity, we judge the resulting language models by their performance in automatic speech recognition and machine translation tasks. We evaluate our method in different contexts. We show that it yields consistent improvements, up to 2% absolute reduction in word error rate and 0.3 Bleu points. We achieve these improvements even given a much smaller in-domain set.
pdf
The Karlsruhe Institute of Technology Translation Systems for the WMT 2014
Teresa Herrmann
|
Mohammed Mediani
|
Eunah Cho
|
Thanh-Le Ha
|
Jan Niehues
|
Isabel Slawik
|
Yuqi Zhang
|
Alex Waibel
Proceedings of the Ninth Workshop on Statistical Machine Translation
2013
pdf
The Karlsruhe Institute of Technology Translation Systems for the WMT 2013
Eunah Cho
|
Thanh-Le Ha
|
Mohammed Mediani
|
Jan Niehues
|
Teresa Herrmann
|
Isabel Slawik
|
Alex Waibel
Proceedings of the Eighth Workshop on Statistical Machine Translation
pdf
Joint WMT 2013 Submission of the QUAERO Project
Stephan Peitz
|
Saab Mansour
|
Matthias Huck
|
Markus Freitag
|
Hermann Ney
|
Eunah Cho
|
Teresa Herrmann
|
Mohammed Mediani
|
Jan Niehues
|
Alex Waibel
|
Alexander Allauzen
|
Quoc Khanh Do
|
Bianka Buschbeck
|
Tonio Wandmacher
Proceedings of the Eighth Workshop on Statistical Machine Translation
pdf
abs
EU-BRIDGE MT: text translation of talks in the EU-BRIDGE project
Markus Freitag
|
Stephan Peitz
|
Joern Wuebker
|
Hermann Ney
|
Nadir Durrani
|
Matthias Huck
|
Philipp Koehn
|
Thanh-Le Ha
|
Jan Niehues
|
Mohammed Mediani
|
Teresa Herrmann
|
Alex Waibel
|
Nicola Bertoldi
|
Mauro Cettolo
|
Marcello Federico
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign
EU-BRIDGE1 is a European research project which is aimed at developing innovative speech translation technology. This paper describes one of the collaborative efforts within EUBRIDGE to further advance the state of the art in machine translation between two European language pairs, English→French and German→English. Four research institutions involved in the EU-BRIDGE project combined their individual machine translation systems and participated with a joint setup in the machine translation track of the evaluation campaign at the 2013 International Workshop on Spoken Language Translation (IWSLT). We present the methods and techniques to achieve high translation quality for text translation of talks which are applied at RWTH Aachen University, the University of Edinburgh, Karlsruhe Institute of Technology, and Fondazione Bruno Kessler. We then show how we have been able to considerably boost translation performance (as measured in terms of the metrics BLEU and TER) by means of system combination. The joint setups yield empirical gains of up to 1.4 points in BLEU and 2.8 points in TER on the IWSLT test sets compared to the best single systems.
pdf
abs
The KIT translation systems for IWSLT 2013
Than-Le Ha
|
Teresa Herrmann
|
Jan Niehues
|
Mohammed Mediani
|
Eunah Cho
|
Yuqi Zhang
|
Isabel Slawik
|
Alex Waibel
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign
In this paper, we present the KIT systems participating in all three official directions, namely English→German, German→English, and English→French, in translation tasks of the IWSLT 2013 machine translation evaluation. Additionally, we present the results for our submissions to the optional directions English→Chinese and English→Arabic. We used phrase-based translation systems to generate the translations. This year, we focused on adapting the systems towards ASR input. Furthermore, we investigated different reordering models as well as an extended discriminative word lexicon. Finally, we added a data selection approach for domain adaptation.
2012
pdf
bib
abs
The KIT translation systems for IWSLT 2012
Mohammed Mediani
|
Yuqi Zhang
|
Thanh-Le Ha
|
Jan Niehues
|
Eunach Cho
|
Teresa Herrmann
|
Rainer Kärgel
|
Alexander Waibel
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign
In this paper, we present the KIT systems participating in the English-French TED Translation tasks in the framework of the IWSLT 2012 machine translation evaluation. We also present several additional experiments on the English-German, English-Chinese and English-Arabic translation pairs. Our system is a phrase-based statistical machine translation system, extended with many additional models which were proven to enhance the translation quality. For instance, it uses the part-of-speech (POS)-based reordering, translation and language model adaptation, bilingual language model, word-cluster language model, discriminative word lexica (DWL), and continuous space language model. In addition to this, the system incorporates special steps in the preprocessing and in the post-processing step. In the preprocessing the noisy corpora are filtered by removing the noisy sentence pairs, whereas in the postprocessing the agreement between a noun and its surrounding words in the French translation is corrected based on POS tags with morphological information. Our system deals with speech transcription input by removing case information and punctuation except periods from the text translation model.
pdf
The Karlsruhe Institute of Technology Translation Systems for the WMT 2012
Jan Niehues
|
Yuqi Zhang
|
Mohammed Mediani
|
Teresa Herrmann
|
Eunah Cho
|
Alex Waibel
Proceedings of the Seventh Workshop on Statistical Machine Translation
2011
pdf
abs
The KIT English-French translation systems for IWSLT 2011
Mohammed Mediani
|
Eunach Cho
|
Jan Niehues
|
Teresa Herrmann
|
Alex Waibel
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper presents the KIT system participating in the English→French TALK Translation tasks in the framework of the IWSLT 2011 machine translation evaluation. Our system is a phrase-based translation system using POS-based reordering extended with many additional features. First of all, a special preprocessing is devoted to the Giga corpus in order to minimize the effect of the great amount of noise it contains. In addition, the system gives more importance to the in-domain data by adapting the translation and the language models as well as by using a wordcluster language model. Furthermore, the system is extended by a bilingual language model and a discriminative word lexicon. The automatic speech transcription input usually has no or wrong punctuation marks, therefore these marks were especially removed from the source training data for the SLT system training.
pdf
The Karlsruhe Institute of Technology Translation Systems for the WMT 2011
Teresa Herrmann
|
Mohammed Mediani
|
Jan Niehues
|
Alex Waibel
Proceedings of the Sixth Workshop on Statistical Machine Translation
2010
pdf
The KIT translation system for IWSLT 2010
Jan Niehues
|
Mohammed Mediani
|
Teresa Herrmann
|
Michael Heck
|
Christian Herff
|
Alex Waibel
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign
pdf
The Karlsruhe Institute for Technology Translation System for the ACL-WMT 2010
Jan Niehues
|
Teresa Herrmann
|
Mohammed Mediani
|
Alex Waibel
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR