Vera Cabarrão

Also published as: Vera Cabarrão


2022

pdf
Fast-Paced Improvements to Named Entity Handling for Neural Machine Translation
Pedro Mota | Vera Cabarrão | Eduardo Farah
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

In this work, we propose a Named Entity handling approach to improve translation quality within an existing Natural Language Processing (NLP) pipeline without modifying the Neural Machine Translation (NMT) component. Our approach seeks to enable fast delivery of such improvements and alleviate user experience problems related to NE distortion. We implement separate NE recognition and translation steps. Then, a combination of standard entity masking technique and a novel semantic equivalent placeholder guarantees that both NE translation is respected and the best overall quality is obtained from NMT. The experiments show that translation quality improves in 38.6% of the test cases when compared to a version of the NLP pipeline with less-developed NE handling capability.

pdf
A Case Study on the Importance of Named Entities in a Machine Translation Pipeline for Customer Support Content
Miguel Menezes | Vera Cabarrão | Pedro Mota | Helena Moniz | Alon Lavie
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

This paper describes the research developed at Unbabel, a Portuguese Machine-translation start-up, that combines MT with human post-edition and focuses strictly on customer service content. We aim to contribute to furthering MT quality and good-practices by exposing the importance of having a continuously-in-development robust Named Entity Recognition system compliant with General Data Protection Regulation (GDPR). Moreover, we have tested semiautomatic strategies that support and enhance the creation of Named Entities gold standards to allow a more seamless implementation of Multilingual Named Entities Recognition Systems. The project described in this paper is the result of a shared work between Unbabel ́s linguists and Unbabel ́s AI engineering team, matured over a year. The project should, also, be taken as a statement of multidisciplinary, proving and validating the much-needed articulation between the different scientific fields that compose and characterize the area of Natural Language Processing (NLP).

2014

pdf
Revising the annotation of a Broadcast News corpus: a linguistic approach
Vera Cabarrão | Helena Moniz | Fernando Batista | Ricardo Ribeiro | Nuno Mamede | Hugo Meinedo | Isabel Trancoso | Ana Isabel Mata | David Martins de Matos
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents a linguistic revision process of a speech corpus of Portuguese broadcast news focusing on metadata annotation for rich transcription, and reports on the impact of the new data on the performance for several modules. The main focus of the revision process consisted on annotating and revising structural metadata events, such as disfluencies and punctuation marks. The resultant revised data is now being extensively used, and was of extreme importance for improving the performance of several modules, especially the punctuation and capitalization modules, but also the speech recognition system, and all the subsequent modules. The resultant data has also been recently used in disfluency studies across domains.