Pratyush Banerjee


2013

pdf
Quality Estimation-guided Data Selection for Domain Adaptation of SMT
Pratyush Banerjee | Raphael Rubino | Johann Roturier | Josef van Genabith
Proceedings of Machine Translation Summit XIV: Papers

2012

pdf
Domain Adaptation in SMT of User-Generated Forum Content Guided by OOV Word Reduction: Normalization and/or Supplementary Data
Pratyush Banerjee | Sudip Kumar Naskar | Johann Roturier | Andy Way | Josef van Genabith
Proceedings of the 16th Annual Conference of the European Association for Machine Translation

pdf
Translation Quality-Based Supplementary Data Selection by Incremental Update of Translation Models
Pratyush Banerjee | Sudip Kumar Naskar | Johann Roturier | Andy Way | Josef van Genabith
Proceedings of COLING 2012

2011

pdf
The DCU machine translation systems for IWSLT 2011
Pratyush Banerjee | Hala Almaghout | Sudip Naskar | Johann Roturier | Jie Jiang | Andy Way | Josef van Genabith
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper, we provide a description of the Dublin City University’s (DCU) submissions in the IWSLT 2011 evaluationcampaign.1 WeparticipatedintheArabic-Englishand Chinese-English Machine Translation(MT) track translation tasks. We use phrase-based statistical machine translation (PBSMT) models to create the baseline system. Due to the open-domain nature of the data to be translated, we use domain adaptation techniques to improve the quality of translation. Furthermore, we explore target-side syntactic augmentation for an Hierarchical Phrase-Based (HPB) SMT model. Combinatory Categorial Grammar (CCG) is used to extract labels for target-side phrases and non-terminals in the HPB system. Combining the domain adapted language models with the CCG-augmented HPB system gave us the best translations for both language pairs providing statistically significant improvements of 6.09 absolute BLEU points (25.94% relative) and 1.69 absolute BLEU points (15.89% relative) over the unadapted PBSMT baselines for the Arabic-English and Chinese-English language pairs, respectively.

pdf
Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component Level Mixture Modelling
Pratyush Banerjee | Sudip Kumar Naskar | Johann Roturier | Andy Way | Josef van Genabith
Proceedings of Machine Translation Summit XIII: Papers

2010

pdf
Combining Multi-Domain Statistical Machine Translation Models using Automatic Classifiers
Pratyush Banerjee | Jinhua Du | Baoli Li | Sudip Naskar | Andy Way | Josef van Genabith
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

This paper presents a set of experiments on Domain Adaptation of Statistical Machine Translation systems. The experiments focus on Chinese-English and two domain-specific corpora. The paper presents a novel approach for combining multiple domain-trained translation models to achieve improved translation quality for both domain-specific as well as combined sets of sentences. We train a statistical classifier to classify sentences according to the appropriate domain and utilize the corresponding domain-specific MT models to translate them. Experimental results show that the method achieves a statistically significant absolute improvement of 1.58 BLEU (2.86% relative improvement) score over a translation model trained on combined data, and considerable improvements over a model using multiple decoding paths of the Moses decoder, for the combined domain test set. Furthermore, even for domain-specific test sets, our approach works almost as well as dedicated domain-specific models and perfect classification.

pdf
MATREX: The DCU MT System for WMT 2010
Sergio Penkale | Rejwanul Haque | Sandipan Dandapat | Pratyush Banerjee | Ankit K. Srivastava | Jinhua Du | Pavel Pecina | Sudip Kumar Naskar | Mikel L. Forcada | Andy Way
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

2008

pdf
Bengali and Hindi to English CLIR Evaluation
Debasis Mandal | Sandipan Dandapat | Mayank Gupta | Pratyush Banerjee | Sudeshna Sarkar
Proceedings of the 2nd workshop on Cross Lingual Information Access (CLIA) Addressing the Information Need of Multilingual Societies