Kay Rottmann


2021

pdf
Training data reduction for multilingual Spoken Language Understanding systems
Anmol Bansal | Anjali Shenoy | Krishna Chaitanya Pappu | Kay Rottmann | Anurag Dwarakanath
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

Fine-tuning self-supervised pre-trained language models such as BERT has significantly improved state-of-the-art performance on natural language processing tasks. Similar finetuning setups can also be used in commercial large scale Spoken Language Understanding (SLU) systems to perform intent classification and slot tagging on user queries. Finetuning such powerful models for use in commercial systems requires large amounts of training data and compute resources to achieve high performance. This paper is a study on the different empirical methods of identifying training data redundancies for the fine tuning paradigm. Particularly, we explore rule based and semantic techniques to reduce data in a multilingual fine tuning setting and report our results on key SLU metrics. Through our experiments, we show that we can achieve on par/better performance on fine-tuning using a reduced data set as compared to a model finetuned on the entire data set.

2010

pdf
Tools for Collecting Speech Corpora via Mechanical-Turk
Ian Lane | Matthias Eck | Kay Rottmann | Alex Waibel
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk

2008

pdf
Recent Improvements in the CMU Large Scale Chinese-English SMT System
Almut Silja Hildebrand | Kay Rottmann | Mohamed Noamany | Quin Gao | Sanjika Hewavitharana | Nguyen Bach | Stephan Vogel
Proceedings of ACL-08: HLT, Short Papers

2007

pdf
The ISL Phrase-Based MT System for the 2007 ACL Workshop on Statistical Machine Translation
Matthias Paulik | Kay Rottmann | Jan Niehues | Silja Hildebrand | Stephan Vogel
Proceedings of the Second Workshop on Statistical Machine Translation

pdf
The CMU-UKA statistical machine translation systems for IWSLT 2007
Ian Lane | Andreas Zollmann | Thuy Linh Nguyen | Nguyen Bach | Ashish Venugopal | Stephan Vogel | Kay Rottmann | Ying Zhang | Alex Waibel
Proceedings of the Fourth International Workshop on Spoken Language Translation

This paper describes the CMU-UKA statistical machine translation systems submitted to the IWSLT 2007 evaluation campaign. Systems were submitted for three language-pairs: Japanese→English, Chinese→English and Arabic→English. All systems were based on a common phrase-based SMT (statistical machine translation) framework but for each language-pair a specific research problem was tackled. For Japanese→English we focused on two problems: first, punctuation recovery, and second, how to incorporate topic-knowledge into the translation framework. Our Chinese→English submission focused on syntax-augmented SMT and for the Arabic→English task we focused on incorporating morphological-decomposition into the SMT framework. This research strategy enabled us to evaluate a wide variety of approaches which proved effective for the language pairs they were evaluated on.

pdf
Word reordering in statistical machine translation with a POS-based distortion model
Kay Rottmann | Stephan Vogel
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers