2023
pdf
bib
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
Mary Nurminen
|
Judith Brenner
|
Maarit Koponen
|
Sirkku Latomaa
|
Mikhail Mikhailov
|
Frederike Schierl
|
Tharindu Ranasinghe
|
Eva Vanmassenhove
|
Sergi Alvarez Vidal
|
Nora Aranberri
|
Mara Nunziatini
|
Carla Parra Escartín
|
Mikel Forcada
|
Maja Popovic
|
Carolina Scarton
|
Helena Moniz
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
pdf
abs
Context-aware and gender-neutral Translation Memories
Marjolene Paulo
|
Vera Cabarrão
|
Helena Moniz
|
Miguel Menezes
|
Rachel Grewcock
|
Eduardo Farah
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
This work proposes an approach to use Part-Of-Speech (POS) information to automatically detect context-dependent Translation Units (TUs) from a Translation Memory database pertaining to the customer support domain. In line with our goal to minimize context-dependency in TUs, we show how this mechanism can be deployed to create new gender-neutral and context-independent TUs. Our experiments, conducted across Portuguese (PT), Brazilian Portuguese (PT-BR), Spanish (ES), and Spanish-Latam (ES-LATAM), show that the occurrence of certain POS with specific words is accurate in identifying context dependency. In a cross-client analysis, we found that ~10% of the most frequent 13,200 TUs were context-dependent, with gender determining context-dependency in 98% of all confirmed cases. We used these findings to suggest gender-neutral equivalents for the most frequent TUs with gender constraints. Our approach is in use in the Unbabel translation pipeline, and can be integrated into any other Neural Machine Translation (NMT) pipeline.
pdf
abs
Quality Fit for Purpose: Building Business Critical Errors Test Suites
Mariana Cabeça
|
Marianna Buchicchio
|
Madalena Gonçalves
|
Christine Maroti
|
João Godinho
|
Pedro Coelho
|
Helena Moniz
|
Alon Lavie
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
This paper illustrates a new methodology based on Test Suites (Avramidis et al., 2018) with focus on Business Critical Errors (BCEs) (Stewart et al., 2022) to evaluate the output of Machine Translation (MT) and Quality Estimation (QE) systems. We demonstrate the value of relying on semi-automatic evaluation done through scalable BCE-focused Test Suites to monitor both MT and QE systems’ performance for 8 language pairs (LPs) and a total of 4 error categories. This approach allows us to not only track the impact of new features and implementations in a real business environment, but also to identify strengths and weaknesses in models regarding different error types, and subsequently know what to improve henceforth.
pdf
abs
Context-Dependent Embedding Utterance Representations for Emotion Recognition in Conversations
Patrícia Pereira
|
Helena Moniz
|
Isabel Dias
|
Joao Paulo Carvalho
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis
Emotion Recognition in Conversations (ERC) has been gaining increasing importance as conversational agents become more and more common. Recognizing emotions is key for effective communication, being a crucial component in the development of effective and empathetic conversational agents. Knowledge and understanding of the conversational context are extremely valuable for identifying the emotions of the interlocutor. We thus approach Emotion Recognition in Conversations leveraging the conversational context, i.e., taking into attention previous conversational turns. The usual approach to model the conversational context has been to produce context-independent representations of each utterance and subsequently perform contextual modeling of these. Here we propose context-dependent embedding representations of each utterance by leveraging the contextual representational power of pre-trained transformer language models. In our approach, we feed the conversational context appended to the utterance to be classified as input to the RoBERTa encoder, to which we append a simple classification module, thus discarding the need to deal with context after obtaining the embeddings since these constitute already an efficient representation of such context. We also investigate how the number of introduced conversational turns influences our model performance. The effectiveness of our approach is validated on the open-domain DailyDialog dataset and on the task-oriented EmoWOZ dataset.
pdf
abs
Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation
John Mendonça
|
Patrícia Pereira
|
Helena Moniz
|
Joao Paulo Carvalho
|
Alon Lavie
|
Isabel Trancoso
Proceedings of The Eleventh Dialog System Technology Challenge
Despite significant research effort in the development of automatic dialogue evaluation metrics, little thought is given to evaluating dialogues other than in English. At the same time, ensuring metrics are invariant to semantically similar responses is also an overlooked topic. In order to achieve the desired properties of robustness and multilinguality for dialogue evaluation metrics, we propose a novel framework that takes advantage of the strengths of current evaluation models with the newly-established paradigm of prompting Large Language Models (LLMs). Empirical results show our framework achieves state of the art results in terms of mean Spearman correlation scores across several benchmarks and ranks first place on both the Robust and Multilingual tasks of the DSTC11 Track 4 “Automatic Evaluation Metrics for Open-Domain Dialogue Systems”, proving the evaluation capabilities of prompted LLMs.
pdf
abs
A Context-Aware Annotation Framework for Customer Support Live Chat Machine Translation
Miguel Menezes
|
M. Amin Farajian
|
Helena Moniz
|
João Varelas Graça
Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track
To measure context-aware machine translation (MT) systems quality, existing solutions have recommended human annotators to consider the full context of a document. In our work, we revised a well known Machine Translation quality assessment framework, Multidimensional Quality Metrics (MQM), (Lommel et al., 2014) by introducing a set of nine annotation categories that allows to map MT errors to source document contextual phenomenon, for simplicity sake we named such phenomena as contextual triggers. Our analysis shows that the adapted categories set enhanced MQM’s potential for MT error identification, being able to cover up to 61% more errors, when compared to traditional non-context core MQM’s application. Subsequently, we analyzed the severity of these MT “contextual errors”, showing that the majority fall under the critical and major levels, further indicating the impact of such errors. Finally, we measured the ability of existing evaluation metrics in detecting the proposed MT “contextual errors”. The results have shown that current state-of-the-art metrics fall short in detecting MT errors that are caused by contextual triggers on the source document side. With the work developed, we hope to understand how impactful context is for enhancing quality within a MT workflow and draw attention to future integration of the proposed contextual annotation framework into current MQM’s core typology.
pdf
bib
abs
Dialogue Quality and Emotion Annotations for Customer Support Conversations
John Mendonca
|
Patrícia Pereira
|
Miguel Menezes
|
Vera Cabarrão
|
Ana C Farinha
|
Helena Moniz
|
Alon Lavie
|
Isabel Trancoso
Proceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
Task-oriented conversational datasets often lack topic variability and linguistic diversity. However, with the advent of Large Language Models (LLMs) pretrained on extensive, multilingual and diverse text data, these limitations seem overcome. Nevertheless, their generalisability to different languages and domains in dialogue applications remains uncertain without benchmarking datasets. This paper presents a holistic annotation approach for emotion and conversational quality in the context of bilingual customer support conversations. By performing annotations that take into consideration the complete instances that compose a conversation, one can form a broader perspective of the dialogue as a whole. Furthermore, it provides a unique and valuable resource for the development of text classification models. To this end, we present benchmarks for Emotion Recognition and Dialogue Quality Estimation and show that further research is needed to leverage these models in a production setting.
2022
pdf
bib
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
Helena Moniz
|
Lieve Macken
|
Andrew Rufener
|
Loïc Barrault
|
Marta R. Costa-jussà
|
Christophe Declercq
|
Maarit Koponen
|
Ellie Kemp
|
Spyridon Pilos
|
Mikel L. Forcada
|
Carolina Scarton
|
Joachim Van den Bogaert
|
Joke Daems
|
Arda Tezcan
|
Bram Vanroy
|
Margot Fonteyne
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
pdf
abs
Agent and User-Generated Content and its Impact on Customer Support MT
Madalena Gonçalves
|
Marianna Buchicchio
|
Craig Stewart
|
Helena Moniz
|
Alon Lavie
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
This paper illustrates a new evaluation framework developed at Unbabel for measuring the quality of source language text and its effect on both Machine Translation (MT) and Human Post-Edition (PE) performed by non-professional post-editors. We examine both agent and user-generated content from the Customer Support domain and propose that differentiating the two is crucial to obtaining high quality translation output. Furthermore, we present results of initial experimentation with a new evaluation typology based on the Multidimensional Quality Metrics (MQM) Framework Lommel et al., 2014), specifically tailored toward the evaluation of source language text. We show how the MQM Framework Lommel et al., 2014) can be adapted to assess errors of monolingual source texts and demonstrate how very specific source errors propagate to the MT and PE targets. Finally, we illustrate how MT systems are not robust enough to handle very specific source noise in the context of Customer Support data.
pdf
abs
A Case Study on the Importance of Named Entities in a Machine Translation Pipeline for Customer Support Content
Miguel Menezes
|
Vera Cabarrão
|
Pedro Mota
|
Helena Moniz
|
Alon Lavie
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
This paper describes the research developed at Unbabel, a Portuguese Machine-translation start-up, that combines MT with human post-edition and focuses strictly on customer service content. We aim to contribute to furthering MT quality and good-practices by exposing the importance of having a continuously-in-development robust Named Entity Recognition system compliant with General Data Protection Regulation (GDPR). Moreover, we have tested semiautomatic strategies that support and enhance the creation of Named Entities gold standards to allow a more seamless implementation of Multilingual Named Entities Recognition Systems. The project described in this paper is the result of a shared work between Unbabel ́s linguists and Unbabel ́s AI engineering team, matured over a year. The project should, also, be taken as a statement of multidisciplinary, proving and validating the much-needed articulation between the different scientific fields that compose and characterize the area of Natural Language Processing (NLP).
pdf
abs
QUARTZ: Quality-Aware Machine Translation
José G.C. de Souza
|
Ricardo Rei
|
Ana C. Farinha
|
Helena Moniz
|
André F. T. Martins
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
This paper presents QUARTZ, QUality-AwaRe machine Translation, a project led by Unbabel which aims at developing machine translation systems that are more robust and produce fewer critical errors. With QUARTZ we want to enable machine translation for user-generated conversational content types that do not tolerate critical errors in automatic translations.
pdf
abs
Multi3Generation: Multitask, Multilingual, Multimodal Language Generation
Anabela Barreiro
|
José GC de Souza
|
Albert Gatt
|
Mehul Bhatt
|
Elena Lloret
|
Aykut Erdem
|
Dimitra Gkatzia
|
Helena Moniz
|
Irene Russo
|
Fabio Kepler
|
Iacer Calixto
|
Marcin Paprzycki
|
François Portet
|
Isabelle Augenstein
|
Mirela Alhasani
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
This paper presents the Multitask, Multilingual, Multimodal Language Generation COST Action – Multi3Generation (CA18231), an interdisciplinary network of research groups working on different aspects of language generation. This “meta-paper” will serve as reference for citations of the Action in future publications. It presents the objectives, challenges and a the links for the achieved outcomes.
pdf
abs
Findings of the WMT 2022 Shared Task on Chat Translation
Ana C Farinha
|
M. Amin Farajian
|
Marianna Buchicchio
|
Patrick Fernandes
|
José G. C. de Souza
|
Helena Moniz
|
André F. T. Martins
Proceedings of the Seventh Conference on Machine Translation (WMT)
This paper reports the findings of the second edition of the Chat Translation Shared Task. Similarly to the previous WMT 2020 edition, the task consisted of translating bilingual customer support conversational text. However, unlike the previous edition, in which the bilingual data was created from a synthetic monolingual English corpus, this year we used a portion of the newly released Unbabel’s MAIA corpus, which contains genuine bilingual conversations between agents and customers. We also expanded the language pairs to English↔German (en↔de), English↔French (en↔fr), and English↔Brazilian Portuguese (en↔pt-br).Given that the main goal of the shared task is to translate bilingual conversations, participants were encouraged to train and test their models specifically for this environment. In total, we received 18 submissions from 4 different teams. All teams participated in both directions of en↔de. One of the teams also participated in en↔fr and en↔pt-br. We evaluated the submissions with automatic metrics as well as human judgments via Multidimensional Quality Metrics (MQM) on both directions. The official ranking of the systems is based on the overall MQM scores of the participating systems on both directions, i.e. agent and customer.
2020
pdf
bib
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
André Martins
|
Helena Moniz
|
Sara Fumega
|
Bruno Martins
|
Fernando Batista
|
Luisa Coheur
|
Carla Parra
|
Isabel Trancoso
|
Marco Turchi
|
Arianna Bisazza
|
Joss Moorkens
|
Ana Guerberof
|
Mary Nurminen
|
Lena Marg
|
Mikel L. Forcada
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
pdf
abs
Project MAIA: Multilingual AI Agent Assistant
André F. T. Martins
|
Joao Graca
|
Paulo Dimas
|
Helena Moniz
|
Graham Neubig
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
This paper presents the Multilingual Artificial Intelligence Agent Assistant (MAIA), a project led by Unbabel with the collaboration of CMU, INESC-ID and IT Lisbon. MAIA will employ cutting-edge machine learning and natural language processing technologies to build multilingual AI agent assistants, eliminating language barriers. MAIA’s translation layer will empower human agents to provide customer support in real-time, in any language, with human quality.
2017
pdf
The INTERACT Project and Crisis MT
Sharon O’Brien
|
Chao-Hong Liu
|
Andy Way
|
João Graça
|
André Martins
|
Helena Moniz
|
Ellie Kemp
|
Rebecca Petras
Proceedings of Machine Translation Summit XVI: Commercial MT Users and Translators Track
2016
pdf
abs
The SpeDial datasets: datasets for Spoken Dialogue Systems analytics
José Lopes
|
Arodami Chorianopoulou
|
Elisavet Palogiannidi
|
Helena Moniz
|
Alberto Abad
|
Katerina Louka
|
Elias Iosif
|
Alexandros Potamianos
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
The SpeDial consortium is sharing two datasets that were used during the SpeDial project. By sharing them with the community we are providing a resource to reduce the duration of cycle of development of new Spoken Dialogue Systems (SDSs). The datasets include audios and several manual annotations, i.e., miscommunication, anger, satisfaction, repetition, gender and task success. The datasets were created with data from real users and cover two different languages: English and Greek. Detectors for miscommunication, anger and gender were trained for both systems. The detectors were particularly accurate in tasks where humans have high annotator agreement such as miscommunication and gender. As expected due to the subjectivity of the task, the anger detector had a less satisfactory performance. Nevertheless, we proved that the automatic detection of situations that can lead to problems in SDSs is possible and can be a promising direction to reduce the duration of SDS’s development cycle.
pdf
abs
SPA: Web-based Platform for easy Access to Speech Processing Modules
Fernando Batista
|
Pedro Curto
|
Isabel Trancoso
|
Alberto Abad
|
Jaime Ferreira
|
Eugénio Ribeiro
|
Helena Moniz
|
David Martins de Matos
|
Ricardo Ribeiro
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
This paper presents SPA, a web-based Speech Analytics platform that integrates several speech processing modules and that makes it possible to use them through the web. It was developed with the aim of facilitating the usage of the modules, without the need to know about software dependencies and specific configurations. Apart from being accessed by a web-browser, the platform also provides a REST API for easy integration with other applications. The platform is flexible, scalable, provides authentication for access restrictions, and was developed taking into consideration the time and effort of providing new services. The platform is still being improved, but it already integrates a considerable number of audio and text processing modules, including: Automatic transcription, speech disfluency classification, emotion detection, dialog act recognition, age and gender classification, non-nativeness detection, hyper-articulation detection, dialog act recognition, and two external modules for feature extraction and DTMF detection. This paper describes the SPA architecture, presents the already integrated modules, and provides a detailed description for the ones most recently integrated.
2014
pdf
abs
Revising the annotation of a Broadcast News corpus: a linguistic approach
Vera Cabarrão
|
Helena Moniz
|
Fernando Batista
|
Ricardo Ribeiro
|
Nuno Mamede
|
Hugo Meinedo
|
Isabel Trancoso
|
Ana Isabel Mata
|
David Martins de Matos
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper presents a linguistic revision process of a speech corpus of Portuguese broadcast news focusing on metadata annotation for rich transcription, and reports on the impact of the new data on the performance for several modules. The main focus of the revision process consisted on annotating and revising structural metadata events, such as disfluencies and punctuation marks. The resultant revised data is now being extensively used, and was of extreme importance for improving the performance of several modules, especially the punctuation and capitalization modules, but also the speech recognition system, and all the subsequent modules. The resultant data has also been recently used in disfluency studies across domains.
pdf
abs
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
Anabela Barreiro
|
Fernando Batista
|
Ricardo Ribeiro
|
Helena Moniz
|
Isabel Trancoso
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper presents 3 sets of OpenLogos resources, namely the English-German, the English-French, and the English-Italian bilingual dictionaries. In addition to the usual information on part-of-speech, gender, and number for nouns, offered by most dictionaries currently available, OpenLogos bilingual dictionaries have some distinctive features that make them unique: they contain cross-language morphological information (inflectional and derivational), semantico-syntactic knowledge, indication of the head word in multiword units, information about whether a source word corresponds to an homograph, information about verb auxiliaries, alternate words (i.e., predicate or process nouns), causatives, reflexivity, verb aspect, among others. The focal point of the paper will be the semantico-syntactic knowledge that is important for disambiguation and translation precision. The resources are publicly available at the METANET platform for free use by the research community.
pdf
abs
Teenage and adult speech in school context: building and processing a corpus of European Portuguese
Ana Isabel Mata
|
Helena Moniz
|
Fernando Batista
|
Julia Hirschberg
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
We present a corpus of European Portuguese spoken by teenagers and adults in school context, CPE-FACES, with an overview of the differential characteristics of high school oral presentations and the challenges this data poses to automatic speech processing. The CPE-FACES corpus has been created with two main goals: to provide a resource for the study of prosodic patterns in both spontaneous and prepared unscripted speech, and to capture inter-speaker and speaking style variations common at school, for research on oral presentations. Research on speaking styles is still largely based on adult speech. References to teenagers are sparse and cross-analyses of speech types comparing teenagers and adults are rare. We expect CPE-FACES, currently a unique resource in this domain, will contribute to filling this gap in European Portuguese. Focusing on disfluencies and phrase-final phonetic-phonological processes we show the impact of teenage speech on the automatic segmentation of oral presentations. Analyzing fluent final intonation contours in declarative utterances, we also show that communicative situation specificities, speaker status and cross-gender differences are key factors in speaking style variation at school.
pdf
abs
Prosodic, syntactic, semantic guidelines for topic structures across domains and corpora
Ana Isabel Mata
|
Helena Moniz
|
Telmo Móia
|
Anabela Gonçalves
|
Fátima Silva
|
Fernando Batista
|
Inês Duarte
|
Fátima Oliveira
|
Isabel Falé
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper presents the annotation guidelines applied to naturally occurring speech, aiming at an integrated account of contrast and parallel structures in European Portuguese. These guidelines were defined to allow for the empirical study of interactions among intonation and syntax-discourse patterns in selected sets of different corpora (monologues and dialogues, by adults and teenagers). In this paper we focus on the multilayer annotation process of left periphery structures by using a small sample of highly spontaneous speech in which the distinct types of topic structures are displayed. The analysis of this sample provides fundamental training and testing material for further application in a wider range of domains and corpora. The annotation process comprises the following time-linked levels (manual and automatic): phone, syllable and word level transcriptions (including co-articulation effects); tonal events and break levels; part-of-speech tagging; syntactic-discourse patterns (construction type; construction position; syntactic function; discourse function), and disfluency events as well. Speech corpora with such a multi-level annotation are a valuable resource to look into grammar module relations in language use from an integrated viewpoint. Such viewpoint is innovative in our language, and has not been often assumed by studies for other languages.
2008
pdf
abs
The LECTRA Corpus - Classroom Lecture Transcriptions in European Portuguese
Isabel Trancoso
|
Rui Martins
|
Helena Moniz
|
Ana Isabel Mata
|
M. Céu Viana
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
This paper describes the corpus of university lectures that has been recorded in European Portuguese, and some of the recognition experiments we have done with it. The highly specific topic domain and the spontaneous speech nature of the lectures are two of the most challenging problems. Lexical and language model adaptation proved difficult given the scarcity of domain material in Portuguese, but improvements can be achieved with unsupervised acoustic model adaptation. From the point of view of the study of spontaneous speech characteristics, namely disflluencies, the LECTRA corpus has also proved a very valuable resource.