Fabrizio Gotti

2019

pdf bib abs
WiRe57 : A Fine-Grained Benchmark for Open Information Extraction
William Lechelle | Fabrizio Gotti | Phillippe Langlais
Proceedings of the 13th Linguistic Annotation Workshop

We build a reference for the task of Open Information Extraction, on five documents. We tentatively resolve a number of issues that arise, including coreference and granularity, and we take steps toward addressing inference, a significant problem. We seek to better pinpoint the requirements for the task. We produce our annotation guidelines specifying what is correct to extract and what is not. In turn, we use this reference to score existing Open IE systems. We address the non-trivial problem of evaluating the extractions produced by systems against the reference tuples, and share our evaluation script. Among seven compared extractors, we find the MinIE system to perform best.

2014

pdf
A Tool for the Automatic Insertion of Diacritics in French (Zodiac : Insertion automatique des signes diacritiques du français) [in French]
Fabrizio Gotti | Guy Lapalme
Proceedings of TALN 2014 (Volume 3: System Demonstrations)

pdf abs
Hashtag Occurrences, Layout and Translation: A Corpus-driven Analysis of Tweets Published by the Canadian Government
Fabrizio Gotti | Phillippe Langlais | Atefeh Farzindar
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present an aligned bilingual corpus of 8758 tweet pairs in French and English, derived from Canadian government agencies. Hashtags appear in a tweet’s prologue, announcing its topic, or in the tweet’s text in lieu of traditional words, or in an epilogue. Hashtags are words prefixed with a pound sign in 80% of the cases. The rest is mostly multiword hashtags, for which we describe a segmentation algorithm. A manual analysis of the bilingual alignment of 5000 hashtags shows that 5% (French) to 18% (English) of them don’t have a counterpart in their containing tweet’s translation. This analysis shows that 80% of multiword hashtags are correctly translated by humans, and that the mistranslation of the rest may be due to incomplete translation directives regarding social media. We show how these resources and their analysis can guide the design of a machine translation pipeline, and its evaluation. A baseline system implementing a tweet-specific tokenizer yields promising results. The system is improved by translating epilogues, prologues, and text separately. We attempt to feed the SMT engine with the original hashtag and some alternatives (“dehashed” version or a segmented version of multiword hashtags), but translation quality improves at the cost of hashtag recall.

Malgré les nombreuses études visant à améliorer la traduction automatique, la traduction assistée par ordinateur reste la solution préférée des traducteurs lorsqu’une sortie de qualité est recherchée. Cette démonstration vise à présenter le moteur de recherche de traductions TransSearch. Cetteapplication commerciale, accessible sur leWeb, repose d’une part sur l’exploitation d’un bitexte aligné au niveau des phrases, et d’autre part sur des modèles statistiques d’alignement de mots.

2008

pdf abs
Automatic Translation of Court Judgments
Fabrizio Gotti | Guy Lapalme | Elliott Macklovitch | Atefeh Farzindar
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Government and Commercial Uses of MT

This document presents an experiment in the automatic translation of Canadian Court judgments from English to French and from French to English. We show that although the language used in this type of legal text is complex and specialized, an SMT system can produce intelligible and useful translations, provided that the system can be trained on a vast amount of legal text. We also describe the results of a human evaluation of the output of the system.

pdf abs
TransSearch: What are translators looking for?
Elliott Macklovitch | Guy Lapalme | Fabrizio Gotti
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Government and Commercial Uses of MT

Notwithstanding machine translation’s impressive progress over the last decade, many translators remain convinced that the output of even the best MT systems is not sufficient to facilitate the production of publication-quality texts. To increase their productivity they turn instead to translator support tools. We examine the use of one such tool: TransSearch, an online bilingual concordancer. From the millions of requests stored in the system’s logs over a 6-year period, we extracted and analyzed the most frequently submitted queries, in an effort to characterize the kinds of problems for which translators turn to this system for help. What we discover, somewhat surprisingly, is that our system seems particularly well-suited to help translate highly polysemous adverbials and prepositional phrases.

pdf abs
Recherche locale pour la traduction statistique à base de segments
Philippe Langlais | Alexandre Patry | Fabrizio Gotti
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Dans cette étude, nous nous intéressons à des algorithmes de recherche locale pour la traduction statistique à base de segments (phrase-based machine translation). Les algorithmes que nous étudions s’appuient sur une formulation complète d’un état dans l’espace de recherche contrairement aux décodeurs couramment utilisés qui explorent l’espace des préfixes des traductions possibles. Nous montrons que la recherche locale seule, permet de produire des traductions proches en qualité de celles fournies par les décodeurs usuels, en un temps nettement inférieur et à un coût mémoire constant. Nous montrons également sur plusieurs directions de traduction qu’elle permet d’améliorer de manière significative les traductions produites par le système à l’état de l’art Pharaoh (Koehn, 2004).

2007

pdf
A greedy decoder for phrase-based statistical machine translation
Philippe Langlais | Alexandre Patry | Fabrizio Gotti
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

2006

pdf abs
De la Chambre des communes à la chambre d’isolement : adaptabilité d’un système de traduction basé sur les segments de phrases
Philippe Langlais | Fabrizio Gotti | Alexandre Patry
Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Nous présentons notre participation à la deuxième campagne d’évaluation de CESTA, un projet EVALDA de l’action Technolangue. Le but de cette campagne consistait à tester l’aptitude des systèmes de traduction à s’adapter rapidement à une tâche spécifique. Nous analysons la fragilité d’un système de traduction probabiliste entraîné sur un corpus hors-domaine et dressons la liste des expériences que nous avons réalisées pour adapter notre système au domaine médical.

pdf abs
Vers l’intégration du contexte dans une mémoire de traduction sous-phrastique : détection du domaine de traduction
Fabrizio Gotti | Philippe Langlais | Claude Coulombe
Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Nous présentons dans cet article une mémoire de traduction sous-phrastique sensible au domaine de traduction, une première étape vers l’intégration du contexte. Ce système est en mesure de recycler les traductions déjà « vues » par la mémoire, non seulement pour des phrases complètes, mais également pour des sous-séquences contiguës de ces phrases, via un aligneur de mots. Les séquences jugées intéressantes sont proposées au traducteur. Nous expliquons également la création d’un utilisateur artificiel, indispensable pour tester les performances du système en l’absence d’intervention humaine. Nous le testons lors de la traduction d’un ensemble disparate de corpus. Ces performances sont exprimées par un ensemble de métriques que nous définissons. Enfin, nous démontrons que la détection automatique du contexte de traduction peut s’avérer bénéfique et prometteuse pour améliorer le fonctionnement d’une telle mémoire, en agissant comme un filtre sur le matériel cible suggéré.

pdf
Phrase-Based SMT with Shallow Tree-Phrases
Philippe Langlais | Fabrizio Gotti
Proceedings on the Workshop on Statistical Machine Translation

pdf
Mood at work: Ramses versus Pharaoh
Alexandre Patry | Fabrizio Gotti | Philippe Langlais
Proceedings on the Workshop on Statistical Machine Translation

pdf abs
MOOD: A Modular Object-Oriented Decoder for Statistical Machine Translation
Alexandre Patry | Fabrizio Gotti | Philippe Langlais
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We present an Open Source framework called MOOD developed in order tofacilitate the development of a Statistical Machine Translation Decoder.MOOD has been modularized using an object-oriented approach which makes itespecially suitable for the fast development of state-of-the-art decoders. Asa proof of concept, a clone of the pharaoh decoder has been implemented andevaluated. This clone named ramses is part of the current distribution of MOOD.

2005

pdf abs
EBMT by Tree-Phrasing: a Pilot Study
Philippe Langlais | Fabrizio Gotti | Didier Bourigault | Claude Coulombe
Workshop on example-based machine translation

We present a study we conducted to build a repository storing associations between simple dependency treelets in a source language and their corresponding phrases in a target language. To assess the impact of this resource in EBMT, we used the repository to compute coverage statistics on a test bitext and on a n-best list of translation candidates produced by a standard phrase-based decoder.

pdf
NUKTI: English-Inuktitut Word Alignment System Description
Philippe Langlais | Fabrizio Gotti | Guihong Cao
Proceedings of the ACL Workshop on Building and Using Parallel Texts

pdf
RALI: SMT Shared Task System Description
Philippe Langlais | Guihong Cao | Fabrizio Gotti
Proceedings of the ACL Workshop on Building and Using Parallel Texts