Tiago Luís

Also published as: Tiago Luis


2015

pdf bib
Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation
Wang Ling | Chris Dyer | Alan W Black | Isabel Trancoso | Ramón Fermandez | Silvio Amir | Luís Marujo | Tiago Luís
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
Translation errors from English to Portuguese: an annotated corpus
Angela Costa | Tiago Luís | Luísa Coheur
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Analysing the translation errors is a task that can help us finding and describing translation problems in greater detail, but can also suggest where the automatic engines should be improved. Having these aims in mind we have created a corpus composed of 150 sentences, 50 from the TAP magazine, 50 from a TED talk and the other 50 from the from the TREC collection of factoid questions. We have automatically translated these sentences from English into Portuguese using Google Translate and Moses. After we have analysed the errors and created the error annotation taxonomy, the corpus was annotated by a linguist native speaker of Portuguese. Although Google’s overall performance was better in the translation task (we have also calculated the BLUE and NIST scores), there are some error types that Moses was better at coping with, specially discourse level errors.

2012

pdf bib
An English-Portuguese parallel corpus of questions: translation guidelines and application in SMT
Ângela Costa | Tiago Luís | Joana Ribeiro | Ana Cristina Mendes | Luísa Coheur
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The task of Statistical Machine Translation depends on large amounts of training corpora. Despite the availability of several parallel corpora, these are typically composed of declarative sentences, which may not be appropriate when the goal is to translate other types of sentences, e.g., interrogatives. There have been efforts to create corpora of questions, specially in the context of the evaluation of Question-Answering systems. One of those corpora is the UIUC dataset, composed of nearly 6,000 questions, widely used in the task of Question Classification. In this work, we make available the Portuguese version of the UIUC dataset, which we manually translated, as well as the translation guidelines. We show the impact of this corpus in the performance of a state-of-the-art SMT system when translating questions. Finally, we present a taxonomy of translation errors, according to which we analyze the output of the automatic translation before and after using the corpus as training data.

2011

pdf bib
BP2EP - Adaptation of Brazilian Portuguese texts to European Portuguese
Luis Marujo | Nuno Grazina | Tiago Luis | Wang Ling | Luisa Coheur | Isabel Trancoso
Proceedings of the 15th Annual conference of the European Association for Machine Translation

pdf bib
Reordering Modeling using Weighted Alignment Matrices
Wang Ling | Tiago Luís | João Graça | Isabel Trancoso | Luísa Coheur
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
The INESC-ID machine translation system for the IWSLT 2010
Wang Ling | Tiago Luís | João Graça | Luísa Coheur | Isabel Trancoso
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper we describe the Instituto de Engenharia de Sistemas e Computadores Investigac ̧a ̃o e Desenvolvimento (INESC-ID) system that participated in the IWSLT 2010 evaluation campaign. Our main goal for this evaluation was to employ several state-of-the-art methods applied to phrase-based machine translation in order to improve the translation quality. Aside from the IBM M4 alignment model, two constrained alignment models were tested, which produced better overall results. These results were further improved by using weighted alignment matrixes during phrase extraction, rather than the single best alignment. Finally, we tested several filters that ruled out phrase pairs based on puntuation. Our system was evaluated on the BTEC and DIALOG tasks, having achieved a better overall ranking in the DIALOG task.

pdf bib
Towards a general and extensible phrase-extraction algorithm
Wang Ling | Tiago Luís | João Graça | Luísa Coheur | Isabel Trancoso
Proceedings of the 7th International Workshop on Spoken Language Translation: Papers

Phrase-based systems deeply depend on the quality of their phrase tables and therefore, the process of phrase extraction is always a fundamental step. In this paper we present a general and extensible phrase extraction algorithm, where we have highlighted several control points. The instantiation of these control points allows the simulation of previous approaches, as in each one of these points different strategies/heuristics can be tested. We show how previous approaches fit in this algorithm, compare several of them and, in addition, we propose alternative heuristics, showing their impact on the final translation results. Considering two different test scenarios from the IWSLT 2010 competition (BTEC, Fr-En and DIALOG, Cn-En), we have obtained an improvement in the results of 2.4 and 2.8 BLEU points, respectively.

2009

pdf bib
High-Performance High-Volume Layered Corpora Annotation
Tiago Luís | David Martins de Matos
Proceedings of the Third Linguistic Annotation Workshop (LAW III)