Helena Caseli


2020

pdf bib
NMT and PBSMT Error Analyses in English to Brazilian Portuguese Automatic Translations
Helena Caseli | Marcio Inácio
Proceedings of the 12th Language Resources and Evaluation Conference

Machine Translation (MT) is one of the most important natural language processing applications. Independently of the applied MT approach, a MT system automatically generates an equivalent version (in some target language) of an input sentence (in some source language). Recently, a new MT approach has been proposed: neural machine translation (NMT). NMT systems have already outperformed traditional phrase-based statistical machine translation (PBSMT) systems for some pairs of languages. However, any MT approach outputs errors. In this work we present a comparative study of MT errors generated by a NMT system and a PBSMT system trained on the same English – Brazilian Portuguese parallel corpus. This is the first study of this kind involving NMT for Brazilian Portuguese. Furthermore, the analyses and conclusions presented here point out the specific problems of NMT outputs in relation to PBSMT ones and also give lots of insights into how to implement automatic post-editing for a NMT system. Finally, the corpora annotated with MT errors generated by both PBSMT and NMT systems are also available.

2017

pdf bib
Discovering Light Verb Constructions and their Translations from Parallel Corpora without Word Alignment
Natalie Vargas | Carlos Ramisch | Helena Caseli
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

We propose a method for joint unsupervised discovery of multiword expressions (MWEs) and their translations from parallel corpora. First, we apply independent monolingual MWE extraction in source and target languages simultaneously. Then, we calculate translation probability, association score and distributional similarity of co-occurring pairs. Finally, we rank all translations of a given MWE using a linear combination of these features. Preliminary experiments on light verb constructions show promising results.

2015

pdf bib
Never-Ending Multiword Expressions Learning
Alexandre Rondon | Helena Caseli | Carlos Ramisch
Proceedings of the 11th Workshop on Multiword Expressions

2014

pdf bib
Automatic semantic relation extraction from Portuguese texts
Leonardo Sameshima Taba | Helena Caseli
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Nowadays we are facing a growing demand for semantic knowledge in computational applications, particularly in Natural Language Processing (NLP). However, there aren’t sufficient human resources to produce that knowledge at the same rate of its demand. Considering the Portuguese language, which has few resources in the semantic area, the situation is even more alarming. Aiming to solve that problem, this work investigates how some semantic relations can be automatically extracted from Portuguese texts. The two main approaches investigated here are based on (i) textual patterns and (ii) machine learning algorithms. Thus, this work investigates how and to which extent these two approaches can be applied to the automatic extraction of seven binary semantic relations (is-a, part-of, location-of, effect-of, property-of, made-of and used-for) in Portuguese texts. The results indicate that machine learning, in particular Support Vector Machines, is a promising technique for the task, although textual patterns presented better results for the used-for relation.

2009

pdf bib
Statistically-Driven Alignment-Based Multiword Expression Identification for Technical Domains
Helena Caseli | Aline Villavicencio | André Machado | Maria José Finatto
Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications (MWE 2009)