Nadi Tomeh


2021

pdf bib
Proceedings of the Sixth Arabic Natural Language Processing Workshop
Nizar Habash | Houda Bouamor | Hazem Hajj | Walid Magdy | Wajdi Zaghouani | Fethi Bougares | Nadi Tomeh | Ibrahim Abu Farha | Samia Touileb
Proceedings of the Sixth Arabic Natural Language Processing Workshop

2020

pdf bib
Proceedings of the Fifth Arabic Natural Language Processing Workshop
Imed Zitouni | Muhammad Abdul-Mageed | Houda Bouamor | Fethi Bougares | Mahmoud El-Haj | Nadi Tomeh | Wajdi Zaghouani
Proceedings of the Fifth Arabic Natural Language Processing Workshop

pdf bib
Multitask Easy-First Dependency Parsing: Exploiting Complementarities of Different Dependency Representations
Yash Kankanampati | Joseph Le Roux | Nadi Tomeh | Dima Taji | Nizar Habash
Proceedings of the 28th International Conference on Computational Linguistics

In this paper we present a parsing model for projective dependency trees which takes advantage of the existence of complementary dependency annotations which is the case in Arabic, with the availability of CATiB and UD treebanks. Our system performs syntactic parsing according to both annotation types jointly as a sequence of arc-creating operations, and partially created trees for one annotation are also available to the other as features for the score function. This method gives error reduction of 9.9% on CATiB and 6.1% on UD compared to a strong baseline, and ablation tests show that the main contribution of this reduction is given by sharing tree representation between tasks, and not simply sharing BiLSTM layers as is often performed in NLP multitask systems.

2019

pdf bib
Proceedings of the Fourth Arabic Natural Language Processing Workshop
Wassim El-Hajj | Lamia Hadrich Belguith | Fethi Bougares | Walid Magdy | Imed Zitouni | Nadi Tomeh | Mahmoud El-Haj | Wajdi Zaghouani
Proceedings of the Fourth Arabic Natural Language Processing Workshop

2017

pdf bib
Proceedings of the Third Arabic Natural Language Processing Workshop
Nizar Habash | Mona Diab | Kareem Darwish | Wassim El-Hajj | Hend Al-Khalifa | Houda Bouamor | Nadi Tomeh | Mahmoud El-Haj | Wajdi Zaghouani
Proceedings of the Third Arabic Natural Language Processing Workshop

2016

pdf bib
Deep Lexical Segmentation and Syntactic Parsing in the Easy-First Dependency Framework
Matthieu Constant | Joseph Le Roux | Nadi Tomeh
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Fouille de motifs et CRF pour la reconnaissance de symptômes dans les textes biomédicaux (Pattern mining and CRF for symptoms recognition in biomedical texts)
Pierre Holat | Nadi Tomeh | Thierry Charnois | Delphine Battistelli | Marie-Christine Jaulent | Jean-Philippe Métivier
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 2 : TALN (Articles longs)

Dans cet article, nous nous intéressons à l’extraction d’entités médicales de type symptôme dans les textes biomédicaux. Cette tâche est peu explorée dans la littérature et il n’existe pas à notre connaissance de corpus annoté pour entraîner un modèle d’apprentissage. Nous proposons deux approches faiblement supervisées pour extraire ces entités. Une première est fondée sur la fouille de motifs et introduit une nouvelle contrainte de similarité sémantique. La seconde formule la tache comme une tache d’étiquetage de séquences en utilisant les CRF (champs conditionnels aléatoires). Nous décrivons les expérimentations menées qui montrent que les deux approches sont complémentaires en termes d’évaluation quantitative (rappel et précision). Nous montrons en outre que leur combinaison améliore sensiblement les résultats.

2015

pdf bib
Classification de texte enrichie à l’aide de motifs séquentiels
Pierre Holat | Nadi Tomeh | Thierry Charnois
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

En classification de textes, la plupart des méthodes fondées sur des classifieurs statistiques utilisent des mots, ou des combinaisons de mots contigus, comme descripteurs. Si l’on veut prendre en compte plus d’informations le nombre de descripteurs non contigus augmente exponentiellement. Pour pallier à cette croissance, la fouille de motifs séquentiels permet d’extraire, de façon efficace, un nombre réduit de descripteurs qui sont à la fois fréquents et pertinents grâce à l’utilisation de contraintes. Dans ce papier, nous comparons l’utilisation de motifs fréquents sous contraintes et l’utilisation de motifs -libres, comme descripteurs. Nous montrons les avantages et inconvénients de chaque type de motif.

2014

pdf bib
Large Scale Arabic Error Annotation: Guidelines and Framework
Wajdi Zaghouani | Behrang Mohit | Nizar Habash | Ossama Obeid | Nadi Tomeh | Alla Rozovskaya | Noura Farra | Sarah Alkuhlani | Kemal Oflazer
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present annotation guidelines and a web-based annotation framework developed as part of an effort to create a manually annotated Arabic corpus of errors and corrections for various text types. Such a corpus will be invaluable for developing Arabic error correction tools, both for training models and as a gold standard for evaluating error correction algorithms. We summarize the guidelines we created. We also describe issues encountered during the training of the annotators, as well as problems that are specific to the Arabic language that arose during the annotation process. Finally, we present the annotation tool that was developed as part of this project, the annotation pipeline, and the quality of the resulting annotations.

pdf bib
A Pipeline Approach to Supervised Error Correction for the QALB-2014 Shared Task
Nadi Tomeh | Nizar Habash | Ramy Eskander | Joseph Le Roux
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)

pdf bib
Ontology-based Technical Text Annotation
François Lévy | Nadi Tomeh | Yue Ma
Proceedings of the COLING Workshop on Synchronic and Diachronic Approaches to Analyzing Technical Language

pdf bib
Generalized Character-Level Spelling Error Correction
Noura Farra | Nadi Tomeh | Alla Rozovskaya | Nizar Habash
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
LIPN: Introducing a new Geographical Context Similarity Measure and a Statistical Similarity Measure based on the Bhattacharyya coefficient
Davide Buscaldi | Jorge García Flores | Joseph Le Roux | Nadi Tomeh | Belém Priego Sanchez
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

2013

pdf bib
Morphological Analysis and Disambiguation for Dialectal Arabic
Nizar Habash | Ryan Roth | Owen Rambow | Ramy Eskander | Nadi Tomeh
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Processing Spontaneous Orthography
Ramy Eskander | Nizar Habash | Owen Rambow | Nadi Tomeh
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
A Web-based Annotation Framework For Large-Scale Text Correction
Ossama Obeid | Wajdi Zaghouani | Behrang Mohit | Nizar Habash | Kemal Oflazer | Nadi Tomeh
The Companion Volume of the Proceedings of IJCNLP 2013: System Demonstrations

pdf bib
Reranking with Linguistic and Semantic Features for Arabic Optical Character Recognition
Nadi Tomeh | Nizar Habash | Ryan Roth | Noura Farra | Pradeep Dasigi | Mona Diab
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Improving Relative-Entropy Pruning using Statistical Significance
Wang Ling | Nadi Tomeh | Guang Xiang | Isabel Trancoso | Alan Black
Proceedings of COLING 2012: Posters

pdf bib
HadoopPerceptron: a Toolkit for Distributed Perceptron Training and Prediction with MapReduce
Andrea Gesmundo | Nadi Tomeh
Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf bib
Discriminative Weighted Alignment Matrices For Statistical Machine Translation
Nadi Tomeh | Alexandre Allauzen | François Yvon
Proceedings of the 15th Annual conference of the European Association for Machine Translation

pdf bib
How good are your phrases? Assessing phrase quality with single class classification
Nadi Tomeh | Marco Turchi | Guillaume Wisinewski | Alexandre Allauzen | François Yvon
Proceedings of the 8th International Workshop on Spoken Language Translation: Papers

We present a novel translation quality informed procedure for both extraction and scoring of phrase pairs in PBSMT systems. We reformulate the extraction problem in the supervised learning framework. Our goal is twofold. First, We attempt to take the translation quality into account; and second we incorporating arbitrary features in order to circumvent alignment errors. One-Class SVMs and the Mapping Convergence algorithm permit training a single-class classifier to discriminate between useful and useless phrase pairs. Such classifier can be learned from a training corpus that comprises only useful instances. The confidence score, produced by the classifier for each phrase pairs, is employed as a selection criteria. The smoothness of these scores allow a fine control over the size of the resulting translation model. Finally, confidence scores provide a new accuracy-based feature to score phrase pairs. Experimental evaluation of the method shows accurate assessments of phrase pairs quality even for regions in the space of possible phrase pairs that are ignored by other approaches. This enhanced evaluation of phrase pairs leads to improvements in the translation performance as measured by BLEU.

pdf bib
Estimation d’un modèle de traduction à partir d’alignements mot-à-mot non-déterministes (Estimating a translation model from non-deterministic word-to-word alignments)
Nadi Tomeh | Alexandre Allauzen | François Yvon
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Dans les systèmes de traduction statistique à base de segments, le modèle de traduction est estimé à partir d’alignements mot-à-mot grâce à des heuristiques d’extraction et de valuation. Bien que ces alignements mot-à-mot soient construits par des modèles probabilistes, les processus d’extraction et de valuation utilisent ces modèles en faisant l’hypothèse que ces alignements sont déterministes. Dans cet article, nous proposons de lever cette hypothèse en considérant l’ensemble de la matrice d’alignement, d’une paire de phrases, chaque association étant valuée par sa probabilité. En comparaison avec les travaux antérieurs, nous montrons qu’en utilisant un modèle exponentiel pour estimer de manière discriminante ces probabilités, il est possible d’obtenir des améliorations significatives des performances de traduction. Ces améliorations sont mesurées à l’aide de la métrique BLEU sur la tâche de traduction de l’arabe vers l’anglais de l’évaluation NIST MT’09, en considérant deux types de conditions selon la taille du corpus de données parallèles utilisées.

2010

pdf bib
Refining Word Alignment with Discriminative Training
Nadi Tomeh | Alexandre Allauzen | François Yvon | Guillaume Wisniewski
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

The quality of statistical machine translation systems depends on the quality of the word alignments that are computed during the translation model training phase. IBM alignment models, as implemented in the GIZA++ toolkit, constitute the de facto standard for performing these computations. The resulting alignments and translation models are however very noisy, and several authors have tried to improve them. In this work, we propose a simple and effective approach, which considers alignment as a series of independent binary classification problems in the alignment matrix. Through extensive feature engineering and the use of stacking techniques, we were able to obtain alignments much closer to manually defined references than those obtained by the IBM models. These alignments also yield better translation models, delivering improved performance in a large scale Arabic to English translation task.

2009

pdf bib
Complexity-Based Phrase-Table Filtering for Statistical Machine Translation
Nadi Tomeh | Nicola Cancedda | Marc Dymetman
Proceedings of Machine Translation Summit XII: Papers