2020
pdf
abs
A human evaluation of English-Irish statistical and neural machine translation
Meghan Dowling
|
Sheila Castilho
|
Joss Moorkens
|
Teresa Lynn
|
Andy Way
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
With official status in both Ireland and the EU, there is a need for high-quality English-Irish (EN-GA) machine translation (MT) systems which are suitable for use in a professional translation environment. While we have seen recent research on improving both statistical MT and neural MT for the EN-GA pair, the results of such systems have always been reported using automatic evaluation metrics. This paper provides the first human evaluation study of EN-GA MT using professional translators and in-domain (public administration) data for a more accurate depiction of the translation quality available via MT.
2019
pdf
Leveraging backtranslation to improve machine translation for Gaelic languages
Meghan Dowling
|
Teresa Lynn
|
Andy Way
Proceedings of the Celtic Language Technology Workshop
pdf
Improving full-text search results on dúchas.ie using language technology
Brian Ó Raghallaigh
|
Kevin Scannell
|
Meghan Dowling
Proceedings of the Celtic Language Technology Workshop
2018
pdf
bib
SMT versus NMT: Preliminary comparisons for Irish
Meghan Dowling
|
Teresa Lynn
|
Alberto Poncelas
|
Andy Way
Proceedings of the AMTA 2018 Workshop on Technologies for MT of Low Resource Languages (LoResMT 2018)
2016
pdf
abs
Is all that Glitters in Machine Translation Quality Estimation really Gold?
Yvette Graham
|
Timothy Baldwin
|
Meghan Dowling
|
Maria Eskevich
|
Teresa Lynn
|
Lamia Tounsi
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Human-targeted metrics provide a compromise between human evaluation of machine translation, where high inter-annotator agreement is difficult to achieve, and fully automatic metrics, such as BLEU or TER, that lack the validity of human assessment. Human-targeted translation edit rate (HTER) is by far the most widely employed human-targeted metric in machine translation, commonly employed, for example, as a gold standard in evaluation of quality estimation. Original experiments justifying the design of HTER, as opposed to other possible formulations, were limited to a small sample of translations and a single language pair, however, and this motivates our re-evaluation of a range of human-targeted metrics on a substantially larger scale. Results show significantly stronger correlation with human judgment for HBLEU over HTER for two of the nine language pairs we include and no significant difference between correlations achieved by HTER and HBLEU for the remaining language pairs. Finally, we evaluate a range of quality estimation systems employing HTER and direct assessment (DA) of translation adequacy as gold labels, resulting in a divergence in system rankings, and propose employment of DA for future quality estimation evaluations.