Sergio Penkale
2025
CUNI and Phrase at WMT25 MT Evaluation Task
Miroslav Hrabal | Ondrej Glembek | Aleš Tamchyna | Almut Silja Hildebrand | Alan Eckhard | Miroslav Štola | Sergio Penkale | Zuzana Šimečková | Ondřej Bojar | Alon Lavie | Craig Stewart
Proceedings of the Tenth Conference on Machine Translation
Miroslav Hrabal | Ondrej Glembek | Aleš Tamchyna | Almut Silja Hildebrand | Alan Eckhard | Miroslav Štola | Sergio Penkale | Zuzana Šimečková | Ondřej Bojar | Alon Lavie | Craig Stewart
Proceedings of the Tenth Conference on Machine Translation
This paper describes the joint effort of Phrase a.s. and Charles University’sInstitute of Formal and Applied Linguistics (CUNI/UFAL) on the WMT25Automated Translation Quality Evaluation Systems Shared Task. Both teamsparticipated both in a collaborative and competitive manner, i.e. they eachsubmitted a system of their own as well as a contrastive joint system ensemble.In Task~1, we show that such an ensembling—if chosen in a clever way—canlead to a performance boost. We present the analysis of various kinds ofsystems comprising both “traditional” NN-based approach, as well as differentflavours of LLMs—off-the-shelf commercial models, their fine-tuned versions,but also in-house, custom-trained alternative models. In Tasks~2 and~3 we showPhrase’s approach to tackling the tasks via various GPT models: Error SpanAnnotation via the complete MQM solution using non-reasoning models (includingfine-tuned versions) in Task~2, and using reasoning models in Task~3.
2014
Bilingual Termbank Creation via Log-Likelihood Comparison and Phrase-Based Statistical Machine Translation
Rejwanul Haque | Sergio Penkale | Andy Way
Proceedings of the 4th International Workshop on Computational Terminology (Computerm)
Rejwanul Haque | Sergio Penkale | Andy Way
Proceedings of the 4th International Workshop on Computational Terminology (Computerm)
2013
Tailor-made quality-controlled translation
Sergio Penkale
Proceedings of Translating and the Computer 35
Sergio Penkale
Proceedings of Translating and the Computer 35
2012
SmartMATE: An Online End-To-End MT Post-Editing Framework
Sergio Penkale | Andy Way
Workshop on Post-Editing Technology and Practice
Sergio Penkale | Andy Way
Workshop on Post-Editing Technology and Practice
It is a well-known fact that the amount of content which is available to be translated and localized far outnumbers the current amount of translation resources. Automation in general and Machine Translation (MT) in particular are one of the key technologies which can help improve this situation. However, a tool that integrates all of the components needed for the localization process is still missing, and MT is still out of reach for most localisation professionals. In this paper we present an online translation environment which empowers users with MT by enabling engines to be created from their data, without a need for technical knowledge or special hardware requirements and at low cost. Documents in a variety of formats can then be post-edited after being processed with their Translation Memories, MT engines and glossaries. We give an overview of the tool and present a case study of a project for a large games company, showing the applicability of our tool.
From Subtitles to Parallel Corpora
Mark Fishel | Yota Georgakopoulou | Sergio Penkale | Volha Petukhova | Matej Rojc | Martin Volk | Andy Way
Proceedings of the 16th Annual Conference of the European Association for Machine Translation
Mark Fishel | Yota Georgakopoulou | Sergio Penkale | Volha Petukhova | Matej Rojc | Martin Volk | Andy Way
Proceedings of the 16th Annual Conference of the European Association for Machine Translation
SUMAT: Data Collection and Parallel Corpus Compilation for Machine Translation of Subtitles
Volha Petukhova | Rodrigo Agerri | Mark Fishel | Sergio Penkale | Arantza del Pozo | Mirjam Sepesy Maučec | Andy Way | Panayota Georgakopoulou | Martin Volk
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Volha Petukhova | Rodrigo Agerri | Mark Fishel | Sergio Penkale | Arantza del Pozo | Mirjam Sepesy Maučec | Andy Way | Panayota Georgakopoulou | Martin Volk
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Subtitling and audiovisual translation have been recognized as areas that could greatly benefit from the introduction of Statistical Machine Translation (SMT) followed by post-editing, in order to increase efficiency of subtitle production process. The FP7 European project SUMAT (An Online Service for SUbtitling by MAchine Translation: http://www.sumat-project.eu) aims to develop an online subtitle translation service for nine European languages, combined into 14 different language pairs, in order to semi-automate the subtitle translation processes of both freelance translators and subtitling companies on a large scale. In this paper we discuss the data collection and parallel corpus compilation for training SMT systems, which includes several procedures such as data partition, conversion, formatting, normalization and alignment. We discuss in detail each data pre-processing step using various approaches. Apart from the quantity (around 1 million subtitles per language pair), the SUMAT corpus has a number of very important characteristics. First of all, high quality both in terms of translation and in terms of high-precision alignment of parallel documents and their contents has been achieved. Secondly, the contents are provided in one consistent format and encoding. Finally, additional information such as type of content in terms of genres and domain is available.
2010
Accuracy-Based Scoring for Phrase-Based Statistical Machine Translation
Sergio Penkale | Yanjun May | Daniel Galron | Andy Way
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers
Sergio Penkale | Yanjun May | Daniel Galron | Andy Way
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers
Although the scoring features of state-of-the-art Phrase-Based Statistical Machine Translation (PB-SMT) models are weighted so as to optimise an objective function measuring translation quality, the estimation of the features themselves does not have any relation to such quality metrics. In this paper, we introduce a translation quality-based feature to PB-SMT in a bid to improve the translation quality of the system. Our feature is estimated by averaging the edit-distance between phrase pairs involved in the translation of oracle sentences, chosen by automatic evaluation metrics from the N-best outputs of a baseline system, and phrase pairs occurring in the N-best list. Using our method, we report a statistically significant 2.11% relative improvement in BLEU score for the WMT 2009 Spanish-to-English translation task. We also report that using our method we can achieve statistically significant improvements over the baseline using many other MT evaluation metrics, and a substantial increase in speed and reduction in memory use (due to a reduction in phrase-table size of 87%) while maintaining significant gains in translation quality.
MATREX: The DCU MT System for WMT 2010
Sergio Penkale | Rejwanul Haque | Sandipan Dandapat | Pratyush Banerjee | Ankit K. Srivastava | Jinhua Du | Pavel Pecina | Sudip Kumar Naskar | Mikel L. Forcada | Andy Way
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Sergio Penkale | Rejwanul Haque | Sandipan Dandapat | Pratyush Banerjee | Ankit K. Srivastava | Jinhua Du | Pavel Pecina | Sudip Kumar Naskar | Mikel L. Forcada | Andy Way
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
2009
Search
Fix author
Co-authors
- Andy Way 8
- Jinhua Du 2
- Mark Fishel 2
- Daniel Galron 2
- Rejwanul Haque 2
- Volha Petukhova 2
- Martin Volk 2
- Rodrigo Agerri 1
- Pratyush Banerjee 1
- Ondřej Bojar 1
- Sandipan Dandapat 1
- Arantza Del Pozo 1
- Alan Eckhard 1
- Mikel L. Forcada 1
- Yota Georgakopoulou 1
- Panayota Georgakopoulou 1
- Ondrej Glembek 1
- Yifan He 1
- Almut Silja Hildebrand 1
- Miroslav Hrabal 1
- Alon Lavie 1
- Mirjam Sepesy Maucec 1
- Yanjun May 1
- I. Dan Melamed 1
- Sudip Kumar Naskar 1
- Pavel Pecina 1
- Matej Rojc 1
- Ankit Srivastava 1
- Craig Stewart 1
- Aleš Tamchyna 1
- Zuzana Šimečková 1
- Miroslav Štola 1