Fatemeh Azadi


2026

High-quality domain-specific parallel corpora play a significant role in improving the performance of machine translation (MT) and multilingual natural language processing (NLP) systems in a target domain. However, most existing multilingual parallel corpora focus on general-purpose data, and a majority of highly specialized domains such as forced migration are suffering from lack of multilingual data. In this work, we present a new high-quality 4-way parallel corpus in the forced migration domain. The corpus consists of human-translated journal articles from Forced Migration Review in English, French, Spanish, and Arabic. Our corpus contains data aligned at both document and sentence level in four languages and provides a clean and reliable 4-way parallel resource for multilingual research in forced migration. Using this dataset, we benchmark several open-weight large language models (LLMs), an open-weight multilingual MT system, online closed MT systems, and a closed LLM across 12 translation directions. We further leverage our corpus to improve the MT quality of a top-performing multilingual foundation model with two common domain adaptation approaches, fine-tuning and few-shot prompting. Our results demonstrate the effectiveness of our corpus in improving the translation performance of current models in the forced migration domain.

2024

Translation quality estimation (QE) is an important component in real-world machine translation applications. Unfortunately, human labeled QE datasets, which play an important role in developing and assessing QE models, are only available for limited language pairs. In this paper, we present the first English-Persian QE dataset, called EPOQUE, which has manually annotated direct assessment labels. EPOQUE contains 1000 sentences translated from English to Persian and annotated by three human annotators. It is publicly available, and thus can be used as a zero-shot test set, or for other scenarios in future work. We also evaluate and report the performance of two state-of-the-art QE models, i.e., Transquest and CometKiwi, as baselines on our dataset. Furthermore, our experiments show that using a small subset of the proposed dataset containing 300 sentences to fine-tune Transquest, can improve its performance by more that 8% in terms of the Pearson correlation with a held-out test set.

2023

We report the results of the WMT 2023 shared task on Quality Estimation, in which the challenge is to predict the quality of the output of neural machine translation systems at the word and sentence levels, without access to reference translations. This edition introduces a few novel aspects and extensions that aim to enable more fine-grained, and explainable quality estimation approaches. We introduce an updated quality annotation scheme using Multidimensional Quality Metrics to obtain sentence- and word-level quality scores for three language pairs. We also extend the provided data to new language pairs: we specifically target low-resource languages and provide training, development and test data for English-Hindi, English-Tamil, English-Telegu and English-Gujarati as well as a zero-shot test-set for English-Farsi. Further, we introduce a novel fine-grained error prediction task aspiring to motivate research towards more detailed quality predictions.
Word alignment has many applications including cross-lingual annotation projection, bilingual lexicon extraction, and the evaluation or analysis of translation outputs. Recent studies show that using contextualized embeddings from pre-trained multilingual language models could give us high quality word alignments without the need of parallel training data. In this work, we propose PMI-Align which computes and uses the point-wise mutual information between source and target tokens to extract word alignments, instead of the cosine similarity or dot product which is mostly used in recent approaches. Our experiments show that our proposed PMI-Align approach could outperform the rival methods on five out of six language pairs. Although our approach requires no parallel training data, we show that this method could also benefit the approaches using parallel data to fine-tune pre-trained language models on word alignments. Our code and data are publicly available.

2015