Shahram Khadivi


2021

pdf bib
Back-translation for Large-Scale Multilingual Machine Translation
Baohao Liao | Shahram Khadivi | Sanjika Hewavitharana
Proceedings of the Sixth Conference on Machine Translation

This paper illustrates our approach to the shared task on large-scale multilingual machine translation in the sixth conference on machine translation (WMT-21). In this work, we aim to build a single multilingual translation system with a hypothesis that a universal cross-language representation leads to better multilingual translation performance. We extend the exploration of different back-translation methods from bilingual translation to multilingual translation. Better performance is obtained by the constrained sampling method, which is different from the finding of the bilingual translation. Besides, we also explore the effect of vocabularies and the amount of synthetic data. Surprisingly, the smaller size of vocabularies perform better, and the extensive monolingual English data offers a modest improvement. We submitted to both the small tasks and achieve the second place.

pdf bib
Integrated Training for Sequence-to-Sequence Models Using Non-Autoregressive Transformer
Evgeniia Tokarchuk | Jan Rosendahl | Weiyue Wang | Pavel Petrushkov | Tomer Lancewicki | Shahram Khadivi | Hermann Ney
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

Complex natural language applications such as speech translation or pivot translation traditionally rely on cascaded models. However,cascaded models are known to be prone to error propagation and model discrepancy problems. Furthermore, there is no possibility of using end-to-end training data in conventional cascaded systems, meaning that the training data most suited for the task cannot be used.Previous studies suggested several approaches for integrated end-to-end training to overcome those problems, however they mostly rely on(synthetic or natural) three-way data. We propose a cascaded model based on the non-autoregressive Transformer that enables end-to-end training without the need for an explicit intermediate representation. This new architecture (i) avoids unnecessary early decisions that can cause errors which are then propagated throughout the cascaded models and (ii) utilizes the end-to-end training data directly. We conduct an evaluation on two pivot-based machine translation tasks, namely French→German and German→Czech. Our experimental results show that the proposed architecture yields an improvement of more than 2 BLEU for French→German over the cascaded baseline.

2020

pdf bib
Diving Deep into Context-Aware Neural Machine Translation
Jingjing Huo | Christian Herold | Yingbo Gao | Leonard Dahlmann | Shahram Khadivi | Hermann Ney
Proceedings of the Fifth Conference on Machine Translation

Context-aware neural machine translation (NMT) is a promising direction to improve the translation quality by making use of the additional context, e.g., document-level translation, or having meta-information. Although there exist various architectures and analyses, the effectiveness of different context-aware NMT models is not well explored yet. This paper analyzes the performance of document-level NMT models on four diverse domains with a varied amount of parallel document-level bilingual data. We conduct a comprehensive set of experiments to investigate the impact of document-level NMT. We find that there is no single best approach to document-level NMT, but rather that different architectures come out on top on different tasks. Looking at task-specific problems, such as pronoun resolution or headline translation, we find improvements in the context-aware systems, even in cases where the corpus-level metrics like BLEU show no significant improvement. We also show that document-level back-translation significantly helps to compensate for the lack of document-level bi-texts.

2019

pdf bib
Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron
Yunsu Kim | Hendrik Rosendahl | Nick Rossenbach | Jan Rosendahl | Shahram Khadivi | Hermann Ney
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

We propose a novel model architecture and training algorithm to learn bilingual sentence embeddings from a combination of parallel and monolingual data. Our method connects autoencoding and neural machine translation to force the source and target sentence embeddings to share the same space without the help of a pivot language or an additional transformation. We train a multilayer perceptron on top of the sentence embeddings to extract good bilingual sentence pairs from nonparallel or noisy parallel data. Our approach shows promising performance on sentence alignment recovery and the WMT 2018 parallel corpus filtering tasks with only a single model.

pdf bib
Generalizing Back-Translation in Neural Machine Translation
Miguel Graça | Yunsu Kim | Julian Schamper | Shahram Khadivi | Hermann Ney
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)

Back-translation — data augmentation by translating target monolingual data — is a crucial component in modern neural machine translation (NMT). In this work, we reformulate back-translation in the scope of cross-entropy optimization of an NMT model, clarifying its underlying mathematical assumptions and approximations beyond its heuristic usage. Our formulation covers broader synthetic data generation schemes, including sampling from a target-to-source NMT model. With this formulation, we point out fundamental problems of the sampling-based approaches and propose to remedy them by (i) disabling label smoothing for the target-to-source model and (ii) sampling from a restricted search space. Our statements are investigated on the WMT 2018 German <-> English news translation task.

pdf bib
Pivot-based Transfer Learning for Neural Machine Translation between Non-English Languages
Yunsu Kim | Petre Petrov | Pavel Petrushkov | Shahram Khadivi | Hermann Ney
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We present effective pre-training strategies for neural machine translation (NMT) using parallel corpora involving a pivot language, i.e., source-pivot and pivot-target, leading to a significant improvement in source-target translation. We propose three methods to increase the relation among source, pivot, and target languages in the pre-training: 1) step-wise training of a single model for different language pairs, 2) additional adapter component to smoothly connect pre-trained encoder and decoder, and 3) cross-lingual encoder training via autoencoding of the pivot language. Our methods greatly outperform multilingual models up to +2.6% BLEU in WMT 2019 French-German and German-Czech tasks. We show that our improvements are valid also in zero-shot/zero-resource scenarios.

pdf bib
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)
Colin Cherry | Greg Durrett | George Foster | Reza Haffari | Shahram Khadivi | Nanyun Peng | Xiang Ren | Swabha Swayamdipta
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

2018

pdf bib
Learning from Chunk-based Feedback in Neural Machine Translation
Pavel Petrushkov | Shahram Khadivi | Evgeny Matusov
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We empirically investigate learning from partial feedback in neural machine translation (NMT), when partial feedback is collected by asking users to highlight a correct chunk of a translation. We propose a simple and effective way of utilizing such feedback in NMT training. We demonstrate how the common machine translation problem of domain mismatch between training and deployment can be reduced solely based on chunk-level user feedback. We conduct a series of simulation experiments to test the effectiveness of the proposed method. Our results show that chunk-level feedback outperforms sentence based feedback by up to 2.61% BLEU absolute.

pdf bib
Can Neural Machine Translation be Improved with User Feedback?
Julia Kreutzer | Shahram Khadivi | Evgeny Matusov | Stefan Riezler
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

We present the first real-world application of methods for improving neural machine translation (NMT) with human reinforcement, based on explicit and implicit user feedback collected on the eBay e-commerce platform. Previous work has been confined to simulation experiments, whereas in this paper we work with real logged feedback for offline bandit learning of NMT parameters. We conduct a thorough analysis of the available explicit user judgments—five-star ratings of translation quality—and show that they are not reliable enough to yield significant improvements in bandit learning. In contrast, we successfully utilize implicit task-based feedback collected in a cross-lingual search task to improve task-specific and machine translation quality metrics.

pdf bib
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP
Reza Haffari | Colin Cherry | George Foster | Shahram Khadivi | Bahar Salehi
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP

pdf bib
Word-based Domain Adaptation for Neural Machine Translation
Shen Yan | Leonard Dahlmann | Pavel Petrushkov | Sanjika Hewavitharana | Shahram Khadivi
Proceedings of the 15th International Conference on Spoken Language Translation

In this paper, we empirically investigate applying word-level weights to adapt neural machine translation to e-commerce domains, where small e-commerce datasets and large out-of-domain datasets are available. In order to mine in-domain like words in the out-of-domain datasets, we compute word weights by using a domain-specific and a non-domain-specific language model followed by smoothing and binary quantization. The baseline model is trained on mixed in-domain and out-of-domain datasets. Experimental results on En → Zh e-commerce domain translation show that compared to continuing training without word weights, it improves MT quality by up to 3.11% BLEU absolute and 1.59% TER. We have also trained models using fine-tuning on the in-domain data. Pre-training a model with word weights improves fine-tuning up to 1.24% BLEU absolute and 1.64% TER, respectively.

2017

pdf bib
Neural Machine Translation Leveraging Phrase-based Models in a Hybrid Search
Leonard Dahlmann | Evgeny Matusov | Pavel Petrushkov | Shahram Khadivi
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this paper, we introduce a hybrid search for attention-based neural machine translation (NMT). A target phrase learned with statistical MT models extends a hypothesis in the NMT beam search when the attention of the NMT model focuses on the source words translated by this phrase. Phrases added in this way are scored with the NMT model, but also with SMT features including phrase-level translation probabilities and a target language model. Experimental results on German-to-English news domain and English-to-Russian e-commerce domain translation tasks show that using phrase-based models in NMT search improves MT quality by up to 2.3% BLEU absolute as compared to a strong NMT baseline.

2016

pdf bib
Guided Alignment Training for Topic-Aware Neural Machine Translation
Wenhu Chen | Evgeny Matusov | Shahram Khadivi | Jan-Thorsten Peter
Conferences of the Association for Machine Translation in the Americas: MT Researchers' Track

In this paper, we propose an effective way for biasing the attention mechanism of a sequence-to-sequence neural machine translation (NMT) model towards the well-studied statistical word alignment models. We show that our novel guided alignment training approach improves translation quality on real-life e-commerce texts consisting of product titles and descriptions, overcoming the problems posed by many unknown words and a large type/token ratio. We also show that meta-data associated with input texts such as topic or category information can significantly improve translation quality when used as an additional signal to the decoder part of the network. With both novel features, the BLEU score of the NMT system on a product title set improves from 18.6 to 21.3%. Even larger MT quality gains are obtained through domain adaptation of a general domain NMT system to e-commerce data. The developed NMT system also performs well on the IWSLT speech translation task, where an ensemble of four variant systems outperforms the phrase-based baseline by 2.1% BLEU absolute.

2015

pdf bib
A Generative Model for Extracting Parallel Fragments from Comparable Documents
Somayeh Bakhshaei | Shahram Khadivi | Reza Safabakhsh
Proceedings of the Eighth Workshop on Building and Using Comparable Corpora

pdf bib
Improved search strategy for interactive predictions in computer-assisted translation
Fatemeh Azadi | Shahram Khadivi
Proceedings of Machine Translation Summit XV: Papers

2014

pdf bib
Graph-Based Semi-Supervised Conditional Random Fields For Spoken Language Understanding Using Unaligned Data
Mohammad Aliannejadi | Masoud Kiaeeha | Shahram Khadivi | Saeed Shiry Ghidary
Proceedings of the Australasian Language Technology Association Workshop 2014

2013

pdf bib
Meta-level Statistical Machine Translation
Sajad Ebrahimi | Kourosh Meshgi | Shahram Khadivi | Mohammad Ebrahim Shiri Ahmad Abady
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Using Context Vectors in Improving a Machine Translation System with Bridge Language
Samira Tofighi Zahabi | Somayeh Bakhshaei | Shahram Khadivi
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Developing an Open-domain English-Farsi Translation System Using AFEC: Amirkabir Bilingual Farsi-English Corpus
Fattaneh Jabbari | Somayeh Bakshaei | Seyyed Mohammad Mohammadzadeh Ziabary | Shahram Khadivi
Fourth Workshop on Computational Approaches to Arabic-Script-based Languages

The translation quality of Statistical Machine Translation (SMT) depends on the amount of input data especially for morphologically rich languages. Farsi (Persian) language is such a language which has few NLP resources. It also suffers from the non-standard written characters which causes a large variety in the written form of each character. Moreover, the structural difference between Farsi and English results in long range reorderings which cannot be modeled by common SMT reordering models. Here, we try to improve the existing English-Farsi SMT system focusing on these challenges first by expanding our bilingual limited-domain corpus to an open-domain one. Then, to alleviate the character variations, a new text normalization algorithm is offered. Finally, some hand-crafted rules are applied to reduce the structural differences. Using the new corpus, the experimental results showed 8.82% BLEU improvement by applying new normalization method and 9.1% BLEU when rules are used.

pdf bib
A New Search Approach for Interactive-Predictive Computer-Assisted Translation
Zeinab Vakil | Shahram Khadivi
Proceedings of COLING 2012: Posters

pdf bib
Interactive-predictive speech-enabled computer-assisted translation
Shahram Khadivi | Zeinab Vakil
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers

In this paper, we study the incorporation of statistical machine translation models to automatic speech recognition models in the framework of computer-assisted translation. The system is given a source language text to be translated and it shows the source text to the human translator to translate it orally. The system captures the user speech which is the dictation of the target language sentence. Then, the human translator uses an interactive-predictive process to correct the system generated errors. We show the efficiency of this method by higher human productivity gain compared to the baseline systems: pure ASR system and integrated ASR and MT systems.

pdf bib
A Holistic Approach to Bilingual Sentence Fragment Extraction from Comparable Corpora
Mahdi Khademian | Kaveh Taghipour | Saab Mansour | Shahram Khadivi
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Achieving accurate translation, especially in multiple domain documents with statistical machine translation systems, requires more and more bilingual texts and this need becomes more critical when training such systems for language pairs with scarce training data. In the recent years, there have been some researches on new sources of parallel texts that are documents which are not necessarily parallel but are comparable. Since these methods search for possible translation equivalences in a greedy manner, they are unable to consider all possible parallel texts in comparable documents. This paper investigates a different approach for this need by considering relationships between all words of two comparable documents, which works fairly well even in the worst case of comparability. We represent each document pair in a matrix and then transform it to a new space to find parallel fragments. Evaluations show that the system is successful in extraction of useful fragment pairs.

2011

pdf bib
Parallel Corpus Refinement as an Outlier Detection Algorithm
Kaveh Taghipour | Shahram Khadivi | Jia Xu
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
An Unsupervised Alignment Model for Sequence Labeling: Application to Name Transliteration
Najmeh Mousavi Nejad | Shahram Khadivi
Proceedings of the 3rd Named Entities Workshop (NEWS 2011)

pdf bib
The Amirkabir Machine Transliteration System for NEWS 2011: Farsi-to-English Task
Najmeh Mousavi Nejad | Shahram Khadivi | Kaveh Taghipour
Proceedings of the 3rd Named Entities Workshop (NEWS 2011)

2010

pdf bib
WordNet Based Features for Predicting Brain Activity associated with meanings of nouns
Ahmad Babaeian Jelodar | Mehrdad Alizadeh | Shahram Khadivi
Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics

2009

pdf bib
Statistical Approaches to Computer-Assisted Translation
Sergio Barrachina | Oliver Bender | Francisco Casacuberta | Jorge Civera | Elsa Cubel | Shahram Khadivi | Antonio Lagarda | Hermann Ney | Jesús Tomás | Enrique Vidal | Juan-Miguel Vilar
Computational Linguistics, Volume 35, Number 1, March 2009

2007

pdf bib
A Sequence Alignment Model Based on the Averaged Perceptron
Dayne Freitag | Shahram Khadivi
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
A Flexible Architecture for CAT Applications
Saša Hasan | Shahram Khadivi | Richard Zens | Hermann Ney
Proceedings of the 11th Annual conference of the European Association for Machine Translation

pdf bib
Morpho-syntactic Arabic Preprocessing for Arabic to English Statistical Machine Translation
Anas El Isbihani | Shahram Khadivi | Oliver Bender | Hermann Ney
Proceedings on the Workshop on Statistical Machine Translation

pdf bib
Integration of Speech to Computer-Assisted Translation Using Finite-State Automata
Shahram Khadivi | Richard Zens | Hermann Ney
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

2005

pdf bib
The RWTH Phrase-based Statistical Machine Translation System
Richard Zens | Oliver Bender | Sasa Hasan | Shahram Khadivi | Evgeny Matusov | Jia Xu | Yuqi Zhang | Hermann Ney
Proceedings of the Second International Workshop on Spoken Language Translation