Sylwia Ozdowska

2010

pdf abs
Inferring Syntactic Rules for Word Alignment through Inductive Logic Programming
Sylwia Ozdowska | Vincent Claveau
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents and evaluates an original approach to automatically align bitexts at the word level. It relies on a syntactic dependency analysis of the source and target texts and is based on a machine-learning technique, namely inductive logic programming (ILP). We show that ILP is particularly well suited for this task in which the data can only be expressed by (translational and syntactic) relations. It allows us to infer easily rules called syntactic alignment rules. These rules make the most of the syntactic information to align words. A simple bootstrapping technique provides the examples needed by ILP, making this machine learning approach entirely automatic. Moreover, through different experiments, we show that this approach requires a very small amount of training data, and its performance rivals some of the best existing alignment systems. Furthermore, cases of syntactic isomorphisms or non-isomorphisms between the source language and the target language are easily identified through the inferred rules.

2009

pdf
Optimal Bilingual Data for French-English PB-SMT
Sylwia Ozdowska | Andy Way
Proceedings of the 13th Annual Conference of the European Association for Machine Translation

pdf bib abs
Données bilingues pour la TAS français-anglais : impact de la langue source et direction de traduction originales sur la qualité de la traduction
Sylwia Ozdowska
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Prise de position

Dans cet article, nous prenons position par rapport à la question de la qualité des données bilingues destinées à la traduction automatique statistique en terme de langue source et direction de traduction originales à l’égard d’une tâche de traduction français-anglais. Nous montrons que l’entraînement sur un corpus contenant des textes qui ont été à l’origine traduits du français vers l’anglais améliore la qualité de la traduction. Inversement, l’entraînement sur un corpus contenant exclusivement des textes dont la langue source originale n’est ni le français ni l’anglais dégrade la traduction.

pdf
Tracking Relevant Alignment Characteristics for Machine Translation
Patrik Lambert | Yanjun Ma | Sylwia Ozdowska | Andy Way
Proceedings of Machine Translation Summit XII: Posters

2008

pdf abs
Cross-Corpus Evaluation of Word Alignment
Sylwia Ozdowska
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present the procedures we implemented to carry out system oriented evaluation of a syntax-based word aligner, ALIBI. While cross-corpus evaluation is still relatively rare in NLP, we take the approach of regarding cross-corpus evaluation as part of system oriented evaluation. Our hypothesis is that the granularity of alignments and the level of syntactic correspondence depend on corpus type; our objective is to assess how this impacts on alignment quality. We test our system on three English-French parallel corpora. The evaluation procedures are defined in accordance with state-of-the-art word alignment evaluation principles. They include, for each corpus, the creation of a reference set containing multiple annotations of the same data, the assessment of inter-annotator agreement rates and an analysis of the reference set obtained. We show that alignment performance varies across corpora according to the multiple reference annotations produced and further motivate our choice of preserving all reference annotations without solving disagreements between annotators.

pdf abs
Comparing Constituency and Dependency Representations for SMT Phrase-Extraction
Mary Hearne | Sylwia Ozdowska | John Tinsley
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

We consider the value of replacing and/or combining string-basedmethods with syntax-based methods for phrase-based statistical machine translation (PBSMT), and we also consider the relative merits of using constituency-annotated vs. dependency-annotated training data. We automatically derive two subtree-aligned treebanks, dependency-based and constituency-based, from a parallel English–French corpus and extract syntactically motivated word- and phrase-pairs. We automatically measure PB-SMT quality. The results show that combining string-based and syntax-based word- and phrase-pairs can improve translation quality irrespective of the type of syntactic annotation. Furthermore, using dependency annotation yields greater translation quality than constituency annotation for PB-SMT.

pdf
MaTrEx: The DCU MT System for WMT 2008
John Tinsley | Yanjun Ma | Sylwia Ozdowska | Andy Way
Proceedings of the Third Workshop on Statistical Machine Translation

pdf
Improving Word Alignment Using Syntactic Dependencies
Yanjun Ma | Sylwia Ozdowska | Yanli Sun | Andy Way
Proceedings of the ACL-08: HLT Second Workshop on Syntax and Structure in Statistical Translation (SSST-2)

Cet article présente et évalue une approche originale et efficace permettant d’aligner automatiquement un bitexte au niveau des mots. Pour cela, cette approche tire parti d’une analyse syntaxique en dépendances des bitextes effectuée par les outils SYNTEX et utilise une technique d’apprentissage artificiel, la programmation logique inductive, pour apprendre automatiquement des règles dites de propagation. Celles-ci se basent sur les informations syntaxiques connues pour ensuite aligner les mots avec une grande précision. La méthode est entièrement automatique, et les résultats évalués sur les données de la campagne d’alignement HLT montrent qu’elle se compare aux meilleures techniques existantes. De plus, alors que ces dernières nécessitent plusieurs millions de phrases pour s’entraîner, notre approche n’en requiert que quelques centaines. Enfin, l’examen des règles de propagation inférées permet d’identifier facilement les cas d’isomorphismes et de non-isomorphismes syntaxiques entre les deux langues traitées.

2004

pdf abs
Appariement bilingue de mots par propagation syntaxique à partir de corpus français/anglais alignés
Sylwia Ozdowska
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. REncontres jeunes Chercheurs en Informatique pour le Traitement Automatique des Langues

Nous présentons une méthode d’appariement de mots, à partir de corpus français/anglais alignés, qui s’appuie sur l’analyse syntaxique en dépendance des phrases. Tout d’abord, les mots sont appariés à un niveau global grâce au calcul des fréquences de cooccurrence dans des phrases alignées. Ces mots constituent les couples amorces qui servent de point de départ à la propagation des liens d’appariement à l’aide des différentes relations de dépendance identifiées par un analyseur syntaxique dans chacune des deux langues. Pour le moment, cette méthode dite d’appariement local traite majoritairement des cas de parallélisme, c’est-à-dire des cas où les relations syntaxiques sont identiques dans les deux langues et les mots appariés de même catégorie. Elle offre un taux de réussite de 95,4% toutes relations confondues.

pdf
Identifying Correspondences Between Words: an Approach Based on a Bilingual Syntactic Analysis of French/English Parallel Corpora
Sylwia Ozdowska
Proceedings of the Workshop on Multilingual Linguistic Resources