Petr Zemánek


2024

pdf bib
FINDINGS OF THE IWSLT 2024 EVALUATION CAMPAIGN
Ibrahim Said Ahmad | Antonios Anastasopoulos | Ondřej Bojar | Claudia Borg | Marine Carpuat | Roldano Cattoni | Mauro Cettolo | William Chen | Qianqian Dong | Marcello Federico | Barry Haddow | Dávid Javorský | Mateusz Krubiński | Tsz Kin Lam | Xutai Ma | Prashant Mathur | Evgeny Matusov | Chandresh Maurya | John McCrae | Kenton Murray | Satoshi Nakamura | Matteo Negri | Jan Niehues | Xing Niu | Atul Kr. Ojha | John Ortega | Sara Papi | Peter Polák | Adam Pospíšil | Pavel Pecina | Elizabeth Salesky | Nivedita Sethiya | Balaram Sarkar | Jiatong Shi | Claytone Sikasote | Matthias Sperber | Sebastian Stüker | Katsuhito Sudoh | Brian Thompson | Alex Waibel | Shinji Watanabe | Patrick Wilken | Petr Zemánek | Rodolfo Zevallos
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

This paper reports on the shared tasks organized by the 21st IWSLT Conference. The shared tasks address 7 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, speech-to-speech translation, dialect and low-resource speech translation, and Indic languages. The shared tasks attracted 17 teams whose submissions are documented in 27 system papers. The growing interest towards spoken language translation is also witnessed by the constantly increasing number of shared task organizers and contributors to the overview paper, almost evenly distributed across industry and academia.

2023

pdf
Multi-Parallel Corpus of North Levantine Arabic
Mateusz Krubiński | Hashem Sellat | Shadi Saleh | Adam Pospíšil | Petr Zemánek | Pavel Pecina
Proceedings of ArabicNLP 2023

Low-resource Machine Translation (MT) is characterized by the scarce availability of training data and/or standardized evaluation benchmarks. In the context of Dialectal Arabic, recent works introduced several evaluation benchmarks covering both Modern Standard Arabic (MSA) and dialects, mapping, however, mostly to a single Indo-European language - English. In this work, we introduce a multi-lingual corpus consisting of 120,600 multi-parallel sentences in English, French, German, Greek, Spanish, and MSA selected from the OpenSubtitles corpus, which were manually translated into the North Levantine Arabic. By conducting a series of training and fine-tuning experiments, we explore how this novel resource can contribute to the research on Arabic MT.

2014

pdf
Quotations, Relevance and Time Depth: Medieval Arabic Literature in Grids and Networks
Petr Zemánek | Jiří Milička
Proceedings of the 3rd Workshop on Computational Linguistics for Literature (CLFL)