Arul Menezes


2021

pdf
The Curious Case of Hallucinations in Neural Machine Translation
Vikas Raunak | Arul Menezes | Marcin Junczys-Dowmunt
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

In this work, we study hallucinations in Neural Machine Translation (NMT), which lie at an extreme end on the spectrum of NMT pathologies. Firstly, we connect the phenomenon of hallucinations under source perturbation to the Long-Tail theory of Feldman, and present an empirically validated hypothesis that explains hallucinations under source perturbation. Secondly, we consider hallucinations under corpus-level noise (without any source perturbation) and demonstrate that two prominent types of natural hallucinations (detached and oscillatory outputs) could be generated and explained through specific corpus-level noise patterns. Finally, we elucidate the phenomenon of hallucination amplification in popular data-generation processes such as Backtranslation and sequence-level Knowledge Distillation. We have released the datasets and code to replicate our results.

pdf
To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation
Tom Kocmi | Christian Federmann | Roman Grundkiewicz | Marcin Junczys-Dowmunt | Hitokazu Matsushita | Arul Menezes
Proceedings of the Sixth Conference on Machine Translation

Automatic metrics are commonly used as the exclusive tool for declaring the superiority of one machine translation system’s quality over another. The community choice of automatic metric guides research directions and industrial developments by deciding which models are deemed better. Evaluating metrics correlations with sets of human judgements has been limited by the size of these sets. In this paper, we corroborate how reliable metrics are in contrast to human judgements on – to the best of our knowledge – the largest collection of judgements reported in the literature. Arguably, pairwise rankings of two systems are the most common evaluation tasks in research or deployment scenarios. Taking human judgement as a gold standard, we investigate which metrics have the highest accuracy in predicting translation quality rankings for such system pairs. Furthermore, we evaluate the performance of various metrics across different language pairs and domains. Lastly, we show that the sole use of BLEU impeded the development of improved models leading to bad deployment decisions. We release the collection of 2.3M sentence-level human judgements for 4380 systems for further analysis and replication of our work.

2015

pdf
Pre-Computable Multi-Layer Neural Network Language Models
Jacob Devlin | Chris Quirk | Arul Menezes
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf
An AMR parser for English, French, German, Spanish and Japanese and a new AMR-annotated corpus
Lucy Vanderwende | Arul Menezes | Chris Quirk
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

2014

pdf
Speech translation for everyone – breaking down the barriers
Arul Menezes
Proceedings of the 11th International Workshop on Spoken Language Translation: Keynotes

2013

pdf
Social Text Normalization using Contextual Graph Random Walks
Hany Hassan | Arul Menezes
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2008

pdf
Syntactic Models for Structural Word Insertion and Deletion during Translation
Arul Menezes | Chris Quirk
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf
Using Dependency Order Templates to Improve Generality in Translation
Arul Menezes | Chris Quirk
Proceedings of the Second Workshop on Statistical Machine Translation

pdf
Generative models of noisy translations with applications to parallel fragment extraction
Chris Quirk | Raghavendra Udupa U. | Arul Menezes
Proceedings of Machine Translation Summit XI: Papers

2006

pdf
Do we need phrases? Challenging the conventional wisdom in Statistical Machine Translation
Chris Quirk | Arul Menezes
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf
Effectively Using Syntax for Recognizing False Entailment
Rion Snow | Lucy Vanderwende | Arul Menezes
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf
Microsoft Research Treelet Translation System: NAACL 2006 Europarl Evaluation
Arul Menezes | Kristina Toutanova | Chris Quirk
Proceedings on the Workshop on Statistical Machine Translation

2005

pdf
MindNet: An Automatically-Created Lexical Resource
Lucy Vanderwende | Gary Kacmarcik | Hisami Suzuki | Arul Menezes
Proceedings of HLT/EMNLP 2005 Interactive Demonstrations

pdf
Dependency Treelet Translation: Syntactically Informed Phrasal SMT
Chris Quirk | Arul Menezes | Colin Cherry
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

pdf
Microsoft Research Treelet Translation System: IWSLT Evaluation
Arul Menezes | Chris Quirk
Proceedings of the Second International Workshop on Spoken Language Translation

pdf
Dependency Treelet Translation: The Convergence of Statistical and Example-based Machine-translation?
Arul Menezes | Chris Quirk
Workshop on example-based machine translation

We describe a novel approach to machine translation that combines the strengths of the two leading corpus-based approaches: Phrasal SMT and EBMT. We use a syntactically informed decoder and reordering model based on the source dependency tree, in combination with conventional SMT models to incorporate the power of phrasal SMT with the linguistic generality available in a parser. We show that this approach significantly outperforms a leading string-based Phrasal SMT decoder and an EBMT system. We present results from two radically different language pairs, and investigate the sensitivity of this approach to parse quality by using two distinct parsers and oracle experiments. We also validate our automated BLEU scores with a small human evaluation.

2004

pdf
Statistical machine translation using labeled semantic dependency graphs
Anthony Aue | Arul Menezes | Bob Moore | Chris Quirk | Eric Ringger
Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

2002

pdf
Better contextual translation using machine learning
Arul Menezes
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers

One of the problems facing translation systems that automatically extract transfer mappings (rules or examples) from bilingual corpora is the trade-off between contextual specificity and general applicability of the mappings, which typically results in conflicting mappings without distinguishing context. We present a machine-learning approach to choosing between such mappings, using classifiers that, in effect, selectively expand the context for these mappings using features available in a linguistic representation of the source language input. We show that using these classifiers in our machine translation system significantly improves the quality of the translated output. Additionally, the set of distinguishing features selected by the classifiers provides insight into the relative importance of the various linguistic features in choosing the correct contextual translation.

pdf
English-Japanese Example-Based Machine Translation Using Abstract Linguistic Representations
Chris Brockett | Takako Aikawa | Anthony Aue | Arul Menezes | Chris Quirk | Hisami Suzuki
COLING-02: Machine Translation in Asia

2001

pdf
Achieving commercial-quality translation with example-based methods
Stephen Richardson | William Dolan | Arul Menezes | Jessie Pinkham
Proceedings of Machine Translation Summit VIII

We describe MSR-MT, a large-scale example-based machine translation system under development for several language pairs. Trained on aligned English-Spanish technical prose, a blind evaluation shows that MSR-MT’s integration of rule-based parsers, example based processing, and statistical techniques produces translations whose quality in this domain exceeds that of uncustomized commercial MT systems.

pdf
A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora
Arul Menezes | Stephen D. Richardson
Workshop on Example-Based machine Translation

Translation systems that automatically extract transfer mappings (rules or examples) from bilingual corpora have been hampered by the difficulty of achieving accurate alignment and acquiring high quality mappings. We describe an algorithm that uses a best-first strategy and a small alignment grammar to significantly improve the quality of the mappings extracted. For each mapping, frequencies are computed and sufficient context is retained to distinguish competing mappings during translation. Variants of the algorithm are run against a corpus containing 200K sentence pairs and evaluated based on the quality of resulting translations.

pdf
Overcoming the customization bottleneck using example-based MT
Stephen D. Richardson | William B. Dolan | Arul Menezes | Monica Corston-Oliver
Proceedings of the ACL 2001 Workshop on Data-Driven Methods in Machine Translation

pdf
A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora
Arul Menezes | Stephen D. Richardson
Proceedings of the ACL 2001 Workshop on Data-Driven Methods in Machine Translation