Evelyne Tzoukermann

Also published as: E. Tzoukermann

2022

Speech-to-Text and Evaluation of Multiple Machine Translation Systems
Evelyne Tzoukermann | Steven Van Guilder | Jennifer Doyon | Ekaterina Harke
Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track)

The National Virtual Translation Center (NVTC) and the larger Federal Bureau of Investiga-tion (FBI) seek to acquire tools that will facilitate its mission to provide English translations of non-English language audio and video files. In the text domain, NVTC has been using translation memory (TM) for some time and has reported on the incorporation of machine translation (MT) into that workflow. While we have explored the use of speech-to-text (STT) and speech translation (ST) in the past, we have now invested in the creation of a substantial human-created corpus to thoroughly evaluate alternatives in three languages: French, Rus-sian, and Persian. We report on the results of multiple STT systems combined with four MT systems for these languages. We evaluated and scored the different systems in combination and analyzed results. This points the way to the most successful tool combination to deploy in this workflow.

2021

pdf bib abs

Corpus Creation and Evaluation for Speech-to-Text and Speech Translation
Corey Miller | Evelyne Tzoukermann | Jennifer Doyon | Elizabeth Mallard
Proceedings of Machine Translation Summit XVIII: Users and Providers Track

The National Virtual Translation Center (NVTC) seeks to acquire human language technology (HLT) tools that will facilitate its mission to provide verbatim English translations of foreign language audio and video files. In the text domain, NVTC has been using translation memory (TM) for some time and has reported on the incorporation of machine translation (MT) into that workflow (Miller et al., 2020). While we have explored the use of speech-totext (STT) and speech translation (ST) in the past (Tzoukermann and Miller, 2018), we have now invested in the creation of a substantial human-made corpus to thoroughly evaluate alternatives. Results from our analysis of this corpus and the performance of HLT tools point the way to the most promising ones to deploy in our workflow.

2018

pdf bib

Evaluating Automatic Speech Recognition in Translation
Evelyne Tzoukermann | Corey Miller
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)

2005

pdf bib abs

Rapid Ramp-up for Statistical Machine Translation: Minimal Training for Maximal Coverage
Hemali Majithia | Philip Rennart | Evelyne Tzoukermann
Proceedings of Machine Translation Summit X: Posters

This paper investigates optimal ways to get maximal coverage from minimal input training corpus. In effect, it seems antagonistic to think of minimal input training with a statistical machine translation system. Since statistics work well with repetition and thus capture well highly occurring words, one challenge has been to figure out the optimal number of “new” words that the system needs to be appropriately trained. Additionally, the goal is to minimize the human translation time for training a new language. In order to account for rapid ramp-up translation, we ran several experiments to figure out the minimal amount of data to obtain optimal translation results.