Pavel Denisov


2021

pdf
IMS’ Systems for the IWSLT 2021 Low-Resource Speech Translation Task
Pavel Denisov | Manuel Mager | Ngoc Thang Vu
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes the submission to the IWSLT 2021 Low-Resource Speech Translation Shared Task by IMS team. We utilize state-of-the-art models combined with several data augmentation, multi-task and transfer learning approaches for the automatic speech recognition (ASR) and machine translation (MT) steps of our cascaded system. Moreover, we also explore the feasibility of a full end-to-end speech translation (ST) model in the case of very constrained amount of ground truth labeled data. Our best system achieves the best performance among all submitted systems for Congolese Swahili to English and French with BLEU scores 7.7 and 13.7 respectively, and the second best result for Coastal Swahili to English with BLEU score 14.9.

2020

pdf
ADVISER: A Toolkit for Developing Multi-modal, Multi-domain and Socially-engaged Conversational Agents
Chia-Yu Li | Daniel Ortega | Dirk Väth | Florian Lux | Lindsey Vanderlyn | Maximilian Schmidt | Michael Neumann | Moritz Völkel | Pavel Denisov | Sabrina Jenne | Zorica Kacarevic | Ngoc Thang Vu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present ADVISER - an open-source, multi-domain dialog system toolkit that enables the development of multi-modal (incorporating speech, text and vision), socially-engaged (e.g. emotion recognition, engagement level prediction and backchanneling) conversational agents. The final Python-based implementation of our toolkit is flexible, easy to use, and easy to extend not only for technically experienced users, such as machine learning researchers, but also for less technically experienced users, such as linguists or cognitive scientists, thereby providing a flexible platform for collaborative research.