Luísa Coheur

Also published as: Luisa Coheur


Searching for COMETINHO: The Little Metric That Could
Ricardo Rei | Ana C Farinha | José G.C. de Souza | Pedro G. Ramos | André F.T. Martins | Luisa Coheur | Alon Lavie
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

In recent years, several neural fine-tuned machine translation evaluation metrics such as COMET and BLEURT have been proposed. These metrics achieve much higher correlations with human judgments than lexical overlap metrics at the cost of computational efficiency and simplicity, limiting their applications to scenarios in which one has to score thousands of translation hypothesis (e.g. scoring multiple systems or Minimum Bayes Risk decoding). In this paper, we explore optimization techniques, pruning, and knowledge distillation to create more compact and faster COMET versions. Our results show that just by optimizing the code through the use of caching and length batching we can reduce inference time between 39% and 65% when scoring multiple systems. Also, we show that pruning COMET can lead to a 21% model reduction without affecting the model’s accuracy beyond 0.01 Kendall tau correlation. Furthermore, we present DISTIL-COMET a lightweight distilled version that is 80% smaller and 2.128x faster while attaining a performance close to the original model and above strong baselines such as BERTSCORE and PRISM.

COMET-22: Unbabel-IST 2022 Submission for the Metrics Shared Task
Ricardo Rei | José G. C. de Souza | Duarte Alves | Chrysoula Zerva | Ana C Farinha | Taisiya Glushkova | Alon Lavie | Luisa Coheur | André F. T. Martins
Proceedings of the Seventh Conference on Machine Translation (WMT)

In this paper, we present the joint contribution of Unbabel and IST to the WMT 2022 Metrics Shared Task. Our primary submission – dubbed COMET-22 – is an ensemble between a COMET estimator model trained with Direct Assessments and a newly proposed multitask model trained to predict sentence-level scores along with OK/BAD word-level tags derived from Multidimensional Quality Metrics error annotations. These models are ensembled together using a hyper-parameter search that weights different features extracted from both evaluation models and combines them into a single score. For the reference-free evaluation, we present CometKiwi. Similarly to our primary submission, CometKiwi is an ensemble between two models. A traditional predictor-estimator model inspired by OpenKiwi and our new multitask model trained on Multidimensional Quality Metrics which can also be used without references. Both our submissions show improved correlations compared to state-of-the-art metrics from last year as well as increased robustness to critical errors.

CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task
Ricardo Rei | Marcos Treviso | Nuno M. Guerreiro | Chrysoula Zerva | Ana C Farinha | Christine Maroti | José G. C. de Souza | Taisiya Glushkova | Duarte Alves | Luisa Coheur | Alon Lavie | André F. T. Martins
Proceedings of the Seventh Conference on Machine Translation (WMT)

We present the joint contribution of IST and Unbabel to the WMT 2022 Shared Task on Quality Estimation (QE). Our team participated in all three subtasks: (i) Sentence and Word-level Quality Prediction; (ii) Explainable QE; and (iii) Critical Error Detection. For all tasks we build on top of the COMET framework, connecting it with the predictor-estimator architecture of OpenKiwi, and equipping it with a word-level sequence tagger and an explanation extractor. Our results suggest that incorporating references during pretraining improves performance across several language pairs on downstream tasks, and that jointly training with sentence and word-level objectives yields a further boost. Furthermore, combining attention and gradient information proved to be the top strategy for extracting good explanations of sentence-level QE models. Overall, our submissions achieved the best results for all three tasks for almost all language pairs by a considerable margin.


Online Learning Meets Machine Translation Evaluation: Finding the Best Systems with the Least Human Effort
Vânia Mendonça | Ricardo Rei | Luisa Coheur | Alberto Sardinha | Ana Lúcia Santos
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In Machine Translation, assessing the quality of a large amount of automatic translations can be challenging. Automatic metrics are not reliable when it comes to high performing systems. In addition, resorting to human evaluators can be expensive, especially when evaluating multiple systems. To overcome the latter challenge, we propose a novel application of online learning that, given an ensemble of Machine Translation systems, dynamically converges to the best systems, by taking advantage of the human feedback available. Our experiments on WMT’19 datasets show that our online approach quickly converges to the top-3 ranked systems for the language pairs considered, despite the lack of human feedback for many translations.

MT-Telescope: An interactive platform for contrastive evaluation of MT systems
Ricardo Rei | Ana C Farinha | Craig Stewart | Luisa Coheur | Alon Lavie
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

We present MT-Telescope, a visualization platform designed to facilitate comparative analysis of the output quality of two Machine Translation (MT) systems. While automated MT evaluation metrics are commonly used to evaluate MT systems at a corpus-level, our platform supports fine-grained segment-level analysis and interactive visualisations that expose the fundamental differences in the performance of the compared systems. MT-Telescope also supports dynamic corpus filtering to enable focused analysis on specific phenomena such as; translation of named entities, handling of terminology, and the impact of input segment length on translation quality. Furthermore, the platform provides a bootstrapped t-test for statistical significance as a means of evaluating the rigor of the resulting system ranking. MT-Telescope is open source, written in Python, and is built around a user friendly and dynamic web interface. Complementing other existing tools, our platform is designed to facilitate and promote the broader adoption of more rigorous analysis practices in the evaluation of MT quality.


PE2LGP Animator: A Tool To Animate A Portuguese Sign Language Avatar
Pedro Cabral | Matilde Gonçalves | Hugo Nicolau | Luísa Coheur | Ruben Santos
Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives

Software for the production of sign languages is much less common than for spoken languages. Such software usually relies on 3D humanoid avatars to produce signs which, inevitably, necessitates the use of animation. One barrier to the use of popular animation tools is their complexity and steep learning curve, which can be hard to master for inexperienced users. Here, we present PE2LGP, an authoring system that features a 3D avatar that signs Portuguese Sign Language. Our Animator is designed specifically to craft sign language animations using a key frame method, and is meant to be easy to use and learn to users without animation skills. We conducted a preliminary evaluation of the Animator, where we animated seven Portuguese Sign Language sentences and asked four sign language users to evaluate their quality. This evaluation revealed that the system, in spite of its simplicity, is indeed capable of producing comprehensible messages.

pdf bib
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
André Martins | Helena Moniz | Sara Fumega | Bruno Martins | Fernando Batista | Luisa Coheur | Carla Parra | Isabel Trancoso | Marco Turchi | Arianna Bisazza | Joss Moorkens | Ana Guerberof | Mary Nurminen | Lena Marg | Mikel L. Forcada
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

AIA-BDE: A Corpus of FAQs in Portuguese and their Variations
Hugo Gonçalo Oliveira | João Ferreira | José Santos | Pedro Fialho | Ricardo Rodrigues | Luisa Coheur | Ana Alves
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present AIA-BDE, a corpus of 380 domain-oriented FAQs in Portuguese and their variations, i.e., paraphrases or entailed questions, created manually, by humans, or automatically, with Google Translate. Its aims to be used as a benchmark for FAQ retrieval and automatic question-answering, but may be useful in other contexts, such as the development of task-oriented dialogue systems, or models for natural language inference in an interrogative context. We also report on two experiments. Matching variations with their original questions was not trivial with a set of unsupervised baselines, especially for manually created variations. Besides high performances obtained with ELMo and BERT embeddings, an Information Retrieval system was surprisingly competitive when considering only the first hit. In the second experiment, text classifiers were trained with the original questions, and tested when assigning each variation to one of three possible sources, or assigning them as out-of-domain. Here, the difference between manual and automatic variations was not so significant.

HamNoSyS2SiGML: Translating HamNoSys Into SiGML
Carolina Neves | Luísa Coheur | Hugo Nicolau
Proceedings of the Twelfth Language Resources and Evaluation Conference

Sign Languages are visual languages and the main means of communication used by Deaf people. However, the majority of the information available online is presented through written form. Hence, it is not of easy access to the Deaf community. Avatars that can animate sign languages have gained an increase of interest in this area due to their flexibility in the process of generation and edition. Synthetic animation of conversational agents can be achieved through the use of notation systems. HamNoSys is one of these systems, which describes movements of the body through symbols. Its XML-compliant, SiGML, is a machine-readable input of HamNoSys able to animate avatars. Nevertheless, current tools have no freely available open source libraries that allow the conversion from HamNoSys to SiGML. Our goal is to develop a tool of open access, which can perform this conversion independently from other platforms. This system represents a crucial intermediate step in the bigger pipeline of animating signing avatars. Two cases studies are described in order to illustrate different applications of our tool.


BeamSeg: A Joint Model for Multi-Document Segmentation and Topic Identification
Pedro Mota | Maxine Eskenazi | Luísa Coheur
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

We propose BeamSeg, a joint model for segmentation and topic identification of documents from the same domain. The model assumes that lexical cohesion can be observed across documents, meaning that segments describing the same topic use a similar lexical distribution over the vocabulary. The model implements lexical cohesion in an unsupervised Bayesian setting by drawing from the same language model segments with the same topic. Contrary to previous approaches, we assume that language models are not independent, since the vocabulary changes in consecutive segments are expected to be smooth and not abrupt. We achieve this by using a dynamic Dirichlet prior that takes into account data contributions from other topics. BeamSeg also models segment length properties of documents based on modality (textbooks, slides, etc.). The evaluation is carried out in three datasets. In two of them, improvements of up to 4.8% and 7.3% are obtained in the segmentation and topic identifications tasks, indicating that both tasks should be jointly modeled.

L2F/INESC-ID at SemEval-2019 Task 2: Unsupervised Lexical Semantic Frame Induction using Contextualized Word Representations
Eugénio Ribeiro | Vânia Mendonça | Ricardo Ribeiro | David Martins de Matos | Alberto Sardinha | Ana Lúcia Santos | Luísa Coheur
Proceedings of the 13th International Workshop on Semantic Evaluation

Building large datasets annotated with semantic information, such as FrameNet, is an expensive process. Consequently, such resources are unavailable for many languages and specific domains. This problem can be alleviated by using unsupervised approaches to induce the frames evoked by a collection of documents. That is the objective of the second task of SemEval 2019, which comprises three subtasks: clustering of verbs that evoke the same frame and clustering of arguments into both frame-specific slots and semantic roles. We approach all the subtasks by applying a graph clustering algorithm on contextualized embedding representations of the verbs and arguments. Using such representations is appropriate in the context of this task, since they provide cues for word-sense disambiguation. Thus, they can be used to identify different frames evoked by the same words. Using this approach we were able to outperform all of the baselines reported for the task on the test set in terms of Purity F1, as well as in terms of BCubed F1 in most cases.


L2F/INESC-ID at SemEval-2017 Tasks 1 and 2: Lexical and semantic features in word and textual similarity
Pedro Fialho | Hugo Patinho Rodrigues | Luísa Coheur | Paulo Quaresma
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes our approach to the SemEval-2017 “Semantic Textual Similarity” and “Multilingual Word Similarity” tasks. In the former, we test our approach in both English and Spanish, and use a linguistically-rich set of features. These move from lexical to semantic features. In particular, we try to take advantage of the recent Abstract Meaning Representation and SMATCH measure. Although without state of the art results, we introduce semantic structures in textual similarity and analyze their impact. Regarding word similarity, we target the English language and combine WordNet information with Word Embeddings. Without matching the best systems, our approach proved to be simple and effective.


A study on the production of collocations by European Portuguese learners
Ângela Costa | Luísa Coheur | Teresa Lino
Proceedings of the 12th Workshop on Multiword Expressions

QGASP: a Framework for Question Generation Based on Different Levels of Linguistic Information
Hugo Patinho Rodrigues | Luísa Coheur | Eric Nyberg
Proceedings of the 9th International Natural Language Generation conference

Building a Corpus of Errors and Quality in Machine Translation: Experiments on Error Impact
Ângela Costa | Rui Correia | Luísa Coheur
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we describe a corpus of automatic translations annotated with both error type and quality. The 300 sentences that we have selected were generated by Google Translate, Systran and two in-house Machine Translation systems that use Moses technology. The errors present on the translations were annotated with an error taxonomy that divides errors in five main linguistic categories (Orthography, Lexis, Grammar, Semantics and Discourse), reflecting the language level where the error is located. After the error annotation process, we accessed the translation quality of each sentence using a four point comprehension scale from 1 to 5. Both tasks of error and quality annotation were performed by two different annotators, achieving good levels of inter-annotator agreement. The creation of this corpus allowed us to use it as training data for a translation quality classifier. We concluded on error severity by observing the outputs of two machine learning classifiers: a decision tree and a regression model.


pdf bib
Proceedings of the Fourth Workshop on Vision and Language
Anja Belz | Luisa Coheur | Vittorio Ferrari | Marie-Francine Moens | Katerina Pastra | Ivan Vulić
Proceedings of the Fourth Workshop on Vision and Language

Coupling Natural Language Processing and Animation Synthesis in Portuguese Sign Language Translation
Inês Almeida | Luísa Coheur | Sara Candeias
Proceedings of the Fourth Workshop on Vision and Language

From European Portuguese to Portuguese Sign Language
Inês Almeida | Luísa Coheur | Sara Candeias
Proceedings of SLPAT 2015: 6th Workshop on Speech and Language Processing for Assistive Technologies


JUST.ASK, a QA system that learns to answer new questions from previous interactions
Sérgio Curto | Ana C. Mendes | Pedro Curto | Luísa Coheur | Ângela Costa
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present JUST.ASK, a publicly available Question Answering system, which is freely available. Its architecture is composed of the usual Question Processing, Passage Retrieval and Answer Extraction components. Several details on the information generated and manipulated by each of these components are also provided to the user when interacting with the demonstration. Since JUST.ASK also learns to answer new questions based on users’ feedback, (s)he is invited to identify the correct answers. These will then be used to retrieve answers to future questions.

Translation errors from English to Portuguese: an annotated corpus
Angela Costa | Tiago Luís | Luísa Coheur
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Analysing the translation errors is a task that can help us finding and describing translation problems in greater detail, but can also suggest where the automatic engines should be improved. Having these aims in mind we have created a corpus composed of 150 sentences, 50 from the TAP magazine, 50 from a TED talk and the other 50 from the from the TREC collection of factoid questions. We have automatically translated these sentences from English into Portuguese using Google Translate and Moses. After we have analysed the errors and created the error annotation taxonomy, the corpus was annotated by a linguist native speaker of Portuguese. Although Google’s overall performance was better in the translation task (we have also calculated the BLUE and NIST scores), there are some error types that Moses was better at coping with, specially discourse level errors.


Meet EDGAR, a tutoring agent at MONSERRATE
Pedro Fialho | Luísa Coheur | Sérgio Curto | Pedro Cláudio | Ângela Costa | Alberto Abad | Hugo Meinedo | Isabel Trancoso
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations


An English-Portuguese parallel corpus of questions: translation guidelines and application in SMT
Ângela Costa | Tiago Luís | Joana Ribeiro | Ana Cristina Mendes | Luísa Coheur
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The task of Statistical Machine Translation depends on large amounts of training corpora. Despite the availability of several parallel corpora, these are typically composed of declarative sentences, which may not be appropriate when the goal is to translate other types of sentences, e.g., interrogatives. There have been efforts to create corpora of questions, specially in the context of the evaluation of Question-Answering systems. One of those corpora is the UIUC dataset, composed of nearly 6,000 questions, widely used in the task of Question Classification. In this work, we make available the Portuguese version of the UIUC dataset, which we manually translated, as well as the translation guidelines. We show the impact of this corpus in the performance of a state-of-the-art SMT system when translating questions. Finally, we present a taxonomy of translation errors, according to which we analyze the output of the automatic translation before and after using the corpus as training data.

Extending a wordnet framework for simplicity and scalability
Pedro Fialho | Sérgio Curto | Ana Cristina Mendes | Luísa Coheur
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The WordNet knowledge model is currently implemented in multiple software frameworks providing procedural access to language instances of it. Frameworks tend to be focused on structural/design aspects of the model thus describing low level interfaces for linguistic knowledge retrieval. Typically the only high level feature directly accessible is word lookup while traversal of semantic relations leads to verbose/complex combinations of data structures, pointers and indexes which are irrelevant in an NLP context. Here is described an extension to the JWNL framework that hides technical requirements of access to WordNet features with an essentially word/sense based API applying terminology from the official online interface. This high level API is applied to the original English version of WordNet and to an SQL based Portuguese lexicon, translated into a WordNet based representation usable by JWNL.

Dealing with unknown words in statistical machine translation
João Silva | Luísa Coheur | Ângela Costa | Isabel Trancoso
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In Statistical Machine Translation, words that were not seen during training are unknown words, that is, words that the system will not know how to translate. In this paper we contribute to this research problem by profiting from orthographic cues given by words. Thus, we report a study of the impact of word distance metrics in cognates' detection and, in addition, on the possibility of obtaining possible translations of unknown words through Logical Analogy. Our approach is tested in the translation of corpora from Portuguese to English (and vice-versa).


BP2EP - Adaptation of Brazilian Portuguese texts to European Portuguese
Luis Marujo | Nuno Grazina | Tiago Luis | Wang Ling | Luisa Coheur | Isabel Trancoso
Proceedings of the 15th Annual conference of the European Association for Machine Translation

pdf bib
Named entity translation using anchor texts
Wang Ling | Pável Calado | Bruno Martins | Isabel Trancoso | Alan Black | Luísa Coheur
Proceedings of the 8th International Workshop on Spoken Language Translation: Papers

This work describes a process to extract Named Entity (NE) translations from the text available in web links (anchor texts). It translates a NE by retrieving a list of web documents in the target language, extracting the anchor texts from the links to those documents and finding the best translation from the anchor texts, using a combination of features, some of which, are specific to anchor texts. Experiments performed on a manually built corpora, suggest that over 70% of the NEs, ranging from unpopular to popular entities, can be translated correctly using sorely anchor texts. Tests on a Machine Translation task indicate that the system can be used to improve the quality of the translations of state-of-the-art statistical machine translation systems.

Exploring linguistically-rich patterns for question generation
Sérgio Curto | Ana Cristina Mendes | Luísa Coheur
Proceedings of the UCNLG+Eval: Language Generation and Evaluation Workshop

Reordering Modeling using Weighted Alignment Matrices
Wang Ling | Tiago Luís | João Graça | Isabel Trancoso | Luísa Coheur
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies


The INESC-ID machine translation system for the IWSLT 2010
Wang Ling | Tiago Luís | João Graça | Luísa Coheur | Isabel Trancoso
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper we describe the Instituto de Engenharia de Sistemas e Computadores Investigac ̧a ̃o e Desenvolvimento (INESC-ID) system that participated in the IWSLT 2010 evaluation campaign. Our main goal for this evaluation was to employ several state-of-the-art methods applied to phrase-based machine translation in order to improve the translation quality. Aside from the IBM M4 alignment model, two constrained alignment models were tested, which produced better overall results. These results were further improved by using weighted alignment matrixes during phrase extraction, rather than the single best alignment. Finally, we tested several filters that ruled out phrase pairs based on puntuation. Our system was evaluated on the BTEC and DIALOG tasks, having achieved a better overall ranking in the DIALOG task.

Towards a general and extensible phrase-extraction algorithm
Wang Ling | Tiago Luís | João Graça | Luísa Coheur | Isabel Trancoso
Proceedings of the 7th International Workshop on Spoken Language Translation: Papers

Phrase-based systems deeply depend on the quality of their phrase tables and therefore, the process of phrase extraction is always a fundamental step. In this paper we present a general and extensible phrase extraction algorithm, where we have highlighted several control points. The instantiation of these control points allows the simulation of previous approaches, as in each one of these points different strategies/heuristics can be tested. We show how previous approaches fit in this algorithm, compare several of them and, in addition, we propose alternative heuristics, showing their impact on the final translation results. Considering two different test scenarios from the IWSLT 2010 competition (BTEC, Fr-En and DIALOG, Cn-En), we have obtained an improvement in the results of 2.4 and 2.8 BLEU points, respectively.

Named Entity Recognition in Questions: Towards a Golden Collection
Ana Cristina Mendes | Luísa Coheur | Paula Vaz Lobo
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Named Entity Recognition (NER) plays a relevant role in several Natural Language Processing tasks. Question-Answering (QA) is an example of such, since answers are frequently named entities in agreement with the semantic category expected by a given question. In this context, the recognition of named entities is usually applied in free text data. NER in natural language questions can also aid QA and, thus, should not be disregarded. Nevertheless, it has not yet been given the necessary importance. In this paper, we approach the identification and classification of named entities in natural language questions. We hypothesize that NER results can benefit with the inclusion of previously labeled questions in the training corpus. We present a broad study addressing that hypothesis, focusing on the balance to be achieved between the amount of free text and questions in order to build a suitable training corpus. This work also contributes by providing a set of nearly 5,500 annotated questions with their named entities, freely available for research purposes.


Building a Golden Collection of Parallel Multi-Language Word Alignment
João Graça | Joana Paulo Pardal | Luísa Coheur | Diamantino Caseiro
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper reports an experience on producing manual word alignments over six different language pairs (all combinations between Portuguese, English, French and Spanish) (Graça et al., 2008). Word alignment of each language pair is made over the first 100 sentences of the common test set from the Europarl corpora (Koehn, 2005), corresponding to 600 new annotated sentences. This collection is publicly available at http://www.l2f.inesc- It contains, to our knowledge, the first word alignment gold set for the Portuguese language, with three other languages. Besides, it is to our knowledge, the first multi-language manual word aligned parallel corpus, where the same sentences are annotated for each language pair. We started by using the guidelines presented at (Mariño, 2005) and performed several refinements: some due to under-specifications on the original guidelines, others because of disagreement on some choices. This lead to the development of an extensive new set of guidelines for multi-lingual word alignment annotation that, we believe, makes the alignment process less ambiguous. We evaluate the inter-annotator agreement obtaining an average of 91.6% agreement between the different language pairs.


João V. Graça | Diamantino Caseiro | Luísa Coheur
Proceedings of the Fourth International Workshop on Spoken Language Translation

We present the machine translation system used by L2F from INESC-ID in the evaluation campaign of the International Workshop on Spoken Language Translation (2007), in the task of translating spontaneous conversations in the travel domain from Italian to English.


From a Surface Analysis to a Dependency Structure
Luisa Coheur | Nuno Mamede | Gabriel G. Bes
Proceedings of the Workshop on Recent Advances in Dependency Grammar

A step towards incremental generation of logical forms
Luísa Coheur | Nuno Mamede | Gabriel Bès
Proceedings of the 3rd workshop on RObust Methods in Analysis of Natural Language Data (ROMAND 2004)