2023
pdf
abs
Developing automatic verbatim transcripts for international multilingual meetings: an end-to-end solution
Akshat Dewan
|
Michal Ziemski
|
Henri Meylan
|
Lorenzo Concina
|
Bruno Pouliquen
Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track
This paper presents an end-to-end solution for the creation of fully automated conference meeting transcripts and their machine translations into various languages. This tool has been developed at the World Intellectual Property Organization (WIPO) using in-house developed speech-to-text (S2T) and machine translation (MT) components. Beyond describing data collection and fine-tuning, resulting in a highly customized and robust system, this paper describes the architecture and evolution of the technical components as well as highlights the business impact and benefits from the user side. We also point out particular challenges in the evolution and adoption of the system and how the new approach created a new product and replaced existing established workflows in conference management documentation.
2016
pdf
abs
The United Nations Parallel Corpus v1.0
Michał Ziemski
|
Marcin Junczys-Dowmunt
|
Bruno Pouliquen
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
This paper describes the creation process and statistics of the official United Nations Parallel Corpus, the first parallel corpus composed from United Nations documents published by the original data creator. The parallel corpus presented consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish. The corpus is freely available for download under a liberal license. Apart from the pairwise aligned documents, a fully aligned subcorpus for the six official UN languages is distributed. We provide baseline BLEU scores of our Moses-based SMT systems trained with the full data of language pairs involving English and for all possible translation directions of the six-way subcorpus.
2015
pdf
SMT at the International Maritime Organization: experiences with combining in-house corpora with out-of-domain corpora
Bruno Pouliquen
|
Marcin Junczys-Dowmunt
|
Blanca Pinero
|
Michal Ziemski
Proceedings of the 18th Annual Conference of the European Association for Machine Translation
pdf
SMT at the International Maritime Organization experiences with combining in-house corpus with more general corpus
Bruno Pouliquen
|
Marcin Junczys-Dowmunt
|
Blanca Pinero
|
Michał Ziemski
Proceedings of the 18th Annual Conference of the European Association for Machine Translation