Nicola Cancedda


2014

pdf bib
Fast Domain Adaptation of SMT models without in-Domain Parallel Data
Prashant Mathur | Sriram Venkatapathy | Nicola Cancedda
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
Assessing quick update methods of statistical translation models
Shachar Mirkin | Nicola Cancedda
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers

The ability to quickly incorporate incoming training data into a running translation system is critical in a number of applications. Mechanisms based on incremental model update and the online EM algorithm hold the promise of achieving this objective in a principled way. Still, efficient tools for incremental training are yet to be available. In this paper we experiment with simple alternative solutions for interim model updates, within the popular Moses system. Short of updating the model in real time, such updates can execute in short timeframes even when operating on large models, and achieve a performance level close to, and in some cases exceeding, that of batch retraining.

pdf bib
Generation of Compound Words in Statistical Machine Translation into Compounding Languages
Sara Stymne | Nicola Cancedda | Lars Ahrenberg
Computational Linguistics, Volume 39, Issue 4 - December 2013

2012

pdf bib
Prediction of Learning Curves in Machine Translation
Prasanth Kolachina | Nicola Cancedda | Marc Dymetman | Sriram Venkatapathy
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Private Access to Phrase Tables for Statistical Machine Translation
Nicola Cancedda
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Task-Driven Linguistic Analysis based on an Underspecified Features Representation
Stasinos Konstantopoulos | Valia Kordoni | Nicola Cancedda | Vangelis Karkaletsis | Dietrich Klakow | Jean-Michel Renders
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper we explore a task-driven approach to interfacing NLP components, where language processing is guided by the end-task that each application requires. The core idea is to generalize feature values into feature value distributions, representing under-specified feature values, and to fit linguistic pipelines with a back-channel of specification requests through which subsequent components can declare to preceding ones the importance of narrowing the value distribution of particular features that are critical for the current task.

2011

pdf bib
Confidence-Weighted Learning of Factored Discriminative Language Models
Viet Ha-Thuc | Nicola Cancedda
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Productive Generation of Compound Words in Statistical Machine Translation
Sara Stymne | Nicola Cancedda
Proceedings of the Sixth Workshop on Statistical Machine Translation

2010

pdf bib
A Dataset for Assessing Machine Translation Evaluation Metrics
Lucia Specia | Nicola Cancedda | Marc Dymetman
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We describe a dataset containing 16,000 translations produced by four machine translation systems and manually annotated for quality by professional translators. This dataset can be used in a range of tasks assessing machine translation evaluation metrics, from basic correlation analysis to training and test of machine learning-based metrics. By providing a standard dataset for such tasks, we hope to encourage the development of better MT evaluation metrics.

pdf bib
Minimum Error Rate Training by Sampling the Translation Lattice
Samidh Chatterjee | Nicola Cancedda
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Machine Translation Using Overlapping Alignments and SampleRank
Benjamin Roth | Andrew McCallum | Marc Dymetman | Nicola Cancedda
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

We present a conditional-random-field approach to discriminatively-trained phrase-based machine translation in which training and decoding are both cast in a sampling framework and are implemented uniformly in a new probabilistic programming language for factor graphs. In traditional phrase-based translation, decoding infers both a "Viterbi" alignment and the target sentence. In contrast, in our approach, a rich overlapping-phrase alignment is produced by a fast deterministic method, while probabilistic decoding infers only the target sentence, which is then able to leverage arbitrary features of the entire source sentence, target sentence and alignment. By using SampleRank for learning we could in principle efficiently estimate hundreds of thousands of parameters. Test-time decoding is done by MCMC sampling with annealing. To demonstrate the potential of our approach we show preliminary experiments leveraging alignments that may contain overlapping bi-phrases.

pdf bib
Intersecting Hierarchical and Phrase-Based Models of Translation: Formal Aspects and Algorithms
Marc Dymetman | Nicola Cancedda
Proceedings of the 4th Workshop on Syntax and Structure in Statistical Translation

2009

pdf bib
Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem
Mikhail Zaslavskiy | Marc Dymetman | Nicola Cancedda
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Source-Language Entailment Modeling for Translating Unknown Terms
Shachar Mirkin | Lucia Specia | Nicola Cancedda | Ido Dagan | Marc Dymetman | Idan Szpektor
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Complexity-Based Phrase-Table Filtering for Statistical Machine Translation
Nadi Tomeh | Nicola Cancedda | Marc Dymetman
Proceedings of Machine Translation Summit XII: Papers

pdf bib
Estimating the Sentence-Level Quality of Machine Translation Systems
Lucia Specia | Marco Turchi | Nicola Cancedda | Nello Cristianini | Marc Dymetman
Proceedings of the 13th Annual conference of the European Association for Machine Translation

pdf bib
Introduction
Nicola Cancedda
Proceedings of the 13th Annual conference of the European Association for Machine Translation

pdf bib
Sentence-level confidence estimation for MT
Lucia Specia | Nicola Cancedda | Marc Dymetman | Craig Saunders | Marco Turchi | Nello Cristianini | Zhuoran Wang | John Shawe-Taylor
Proceedings of the 13th Annual conference of the European Association for Machine Translation

pdf bib
Closing remarks
Nicola Cancedda
Proceedings of the 13th Annual conference of the European Association for Machine Translation

2008

pdf bib
Shaping research from user requirements, and other exotic things...
Nicola Cancedda
Proceedings of the 12th Annual conference of the European Association for Machine Translation

2005

pdf bib
Une approche à la traduction automatique statistique par segments discontinus
Michel Simard | Nicola Cancedda | Bruno Cavestro | Marc Dymetman | Eric Gaussier | Cyril Goutte | Philippe Langlais | Arne Mauser | Kenji Yamada
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article présente une méthode de traduction automatique statistique basée sur des segments non-continus, c’est-à-dire des segments formés de mots qui ne se présentent pas nécéssairement de façon contiguë dans le texte. On propose une méthode pour produire de tels segments à partir de corpus alignés au niveau des mots. On présente également un modèle de traduction statistique capable de tenir compte de tels segments, de même qu’une méthode d’apprentissage des paramètres du modèle visant à maximiser l’exactitude des traductions produites, telle que mesurée avec la métrique NIST. Les traductions optimales sont produites par le biais d’une recherche en faisceau. On présente finalement des résultats expérimentaux, qui démontrent comment la méthode proposée permet une meilleure généralisation à partir des données d’entraînement.

pdf bib
Translating with Non-contiguous Phrases
Michel Simard | Nicola Cancedda | Bruno Cavestro | Marc Dymetman | Eric Gaussier | Cyril Goutte | Kenji Yamada | Philippe Langlais | Arne Mauser
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2002

pdf bib
Combining Labelled and Unlabelled Data: A Case Study on Fisher Kernels and Transductive Inference for Biological Entity Recognition
Cyril Goutte | Hervé Déjean | Eric Gaussier | Nicola Cancedda | Jean-Michel Renders
COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002)

2001

pdf bib
Probabilistic models for PP-attachment resolution and NP analysis
Eric Gaussier | Nicola Cancedda
Proceedings of the ACL 2001 Workshop on Computational Natural Language Learning (ConLL)

pdf bib
Learning Computational Grammars
John Nerbonne | Anja Belz | Nicola Cancedda | Hervé Déjean | James Hammerton | Rob Koeling | Stasinos Konstantopoulos | Miles Osborne | Franck Thollard | Erik F. Tjong Kim Sang
Proceedings of the ACL 2001 Workshop on Computational Natural Language Learning (ConLL)

2000

pdf bib
Corpus-Based Grammar Specialization
Nicola Cancedda | Christer Samuelsson
Fourth Conference on Computational Natural Language Learning and the Second Learning Language in Logic Workshop

pdf bib
Experiments with Corpus-based LFG Specialization
Nicola Cancedda | Christer Samuelsson
Sixth Applied Natural Language Processing Conference