2021
abs
Building MT systems in low resourced languages for Public Sector users in Croatia, Iceland, Ireland, and Norway
Róisín Moran
|
Carla Para Escartín
|
Akshai Ramesh
|
Páraic Sheridan
|
Jane Dunne
|
Federico Gaspari
|
Sheila Castilho
|
Natalia Resende
|
Andy Way
Proceedings of Machine Translation Summit XVIII: Users and Providers Track
When developing Machine Translation engines, low resourced language pairs tend to be in a disadvantaged position: less available data means that developing robust MT models can be more challenging.The EU-funded PRINCIPLE project aims at overcoming this challenge for four low resourced European languages: Norwegian, Croatian, Irish and Icelandic. This presentation will give an overview of the project, with a focus on the set of Public Sector users and their use cases for which we have developed MT solutions.We will discuss the range of language resources that have been gathered through contributions from public sector collaborators, and present the extensive evaluations that have been undertaken, including significant user evaluation of MT systems across all of the public sector participants in each of the four countries involved.
2020
pdf
abs
An Error-based Investigation of Statistical and Neural Machine Translation Performance on Hindi-to-Tamil and English-to-Tamil
Akshai Ramesh
|
Venkatesh Balavadhani Parthasa
|
Rejwanul Haque
|
Andy Way
Proceedings of the 7th Workshop on Asian Translation
Statistical machine translation (SMT) was the state-of-the-art in machine translation (MT) research for more than two decades, but has since been superseded by neural MT (NMT). Despite producing state-of-the-art results in many translation tasks, neural models underperform in resource-poor scenarios. Despite some success, none of the present-day benchmarks that have tried to overcome this problem can be regarded as a universal solution to the problem of translation of many low-resource languages. In this work, we investigate the performance of phrase-based SMT (PB-SMT) and NMT on two rarely-tested low-resource language-pairs, English-to-Tamil and Hindi-to-Tamil, taking a specialised data domain (software localisation) into consideration. This paper demonstrates our findings including the identification of several issues of the current neural approaches to low-resource domain-specific text translation.
pdf
abs
The ADAPT System Description for the WMT20 News Translation Task
Venkatesh Parthasarathy
|
Akshai Ramesh
|
Rejwanul Haque
|
Andy Way
Proceedings of the Fifth Conference on Machine Translation
This paper describes the ADAPT Centre’s submissions to the WMT20 News translation shared task for English-to-Tamil and Tamil-to-English. We present our machine translation (MT) systems that were built using the state-of-the-art neural MT (NMT) model, Transformer. We applied various strategies in order to improve our baseline MT systems, e.g. onolin- gual sentence selection for creating synthetic training data, mining monolingual sentences for adapting our MT systems to the task, hyperparameters search for Transformer in lowresource scenarios. Our experiments show that adding the aforementioned techniques to the baseline yields an excellent performance in the English-to-Tamil and Tamil-to-English translation tasks.
pdf
abs
Investigating Low-resource Machine Translation for English-to-Tamil
Akshai Ramesh
|
Venkatesh Balavadhani parthasa
|
Rejwanul Haque
|
Andy Way
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages
Statistical machine translation (SMT) which was the dominant paradigm in machine translation (MT) research for nearly three decades has recently been superseded by the end-to-end deep learning approaches to MT. Although deep neural models produce state-of-the-art results in many translation tasks, they are found to under-perform on resource-poor scenarios. Despite some success, none of the present-day benchmarks that have tried to overcome this problem can be regarded as a universal solution to the problem of translation of many low-resource languages. In this work, we investigate the performance of phrase-based SMT (PB-SMT) and neural MT (NMT) on a rarely-tested low-resource language-pair, English-to-Tamil, taking a specialised data domain (software localisation) into consideration. In particular, we produce rankings of our MT systems via a social media platform-based human evaluation scheme, and demonstrate our findings in the low-resource domain-specific text translation task.