Ahmed Tawfik


2020

pdf
Score Combination for Improved Parallel Corpus Filtering for Low Resource Conditions
Muhammad ElNokrashy | Amr Hendy | Mohamed Abdelghaffar | Mohamed Afify | Ahmed Tawfik | Hany Hassan Awadalla
Proceedings of the Fifth Conference on Machine Translation

This paper presents the description of our submission to WMT20 sentence filtering task. We combine scores from custom LASER built for each source language, a classifier built to distinguish positive and negative pairs and the original scores provided with the task. For the mBART setup, provided by the organizers, our method shows 7% and 5% relative improvement, over the baseline, in sacreBLEU score on the test set for Pashto and Khmer respectively.

2019

pdf bib
Morphology-aware Word-Segmentation in Dialectal Arabic Adaptation of Neural Machine Translation
Ahmed Tawfik | Mahitab Emam | Khaled Essam | Robert Nabil | Hany Hassan
Proceedings of the Fourth Arabic Natural Language Processing Workshop

Parallel corpora available for building machine translation (MT) models for dialectal Arabic (DA) are rather limited. The scarcity of resources has prompted the use of Modern Standard Arabic (MSA) abundant resources to complement the limited dialectal resource. However, dialectal clitics often differ between MSA and DA. This paper compares morphology-aware DA word segmentation to other word segmentation approaches like Byte Pair Encoding (BPE) and Sub-word Regularization (SR). A set of experiments conducted on Egyptian Arabic (EA), Levantine Arabic (LA), and Gulf Arabic (GA) show that a sufficiently accurate morphology-aware segmentation used in conjunction with BPE outperforms the other word segmentation approaches.