The translator profile at the European Central Bank (ECB) has evolved in-creasingly rapidly over the past years. Staff of the linguistic services moved from their traditional set of tasks - translation, editing and proofreading, of-ten done on print-ups or in plain Word documents - to dealing with a multi-tude of tasks, such as project management, procurement, recruitment and training, and lately cultural mediation. But the real game changer in this transformation has been the quantum leap that technology applied to trans-lation (computer-assisted translation (CAT) and machine translation (MT)) has enabled in complex organisational structures. This brought about signifi-cant challenges in terms of people and change management, but also pre-sented huge opportunities and the chance to broadcast the ECB’s language services as a cutting-edge professional unit within the organisation. The transformation is still underway, but we are not in uncharted waters any-more. In fact, we are broadening our interest in new technologies to make our communication more accessible, even beyond the pure translation func-tion.
Nowadays there is a pressing need to develop interpreting-related technolo-gies, with practitioners and other end-users increasingly calling for tools tai-lored to their needs and their new interpreting scenarios. But, at the same time, interpreting as a human activity has resisted complete automation for various reasons, such as fear, unawareness, communication complexities, lack of dedicated tools, etc. Several computer-assisted interpreting tools and resources for interpreters have been developed, although they are rather modest in terms of the sup-port they provide. In the same vein, and despite the pressing need to aiding in multilingual mediation, machine interpreting is still under development, with the exception of a few success stories. This paper will present the results of VIP, a R&D project on language technologies applied to interpreting. It is the ‘seed’ of a family of projects on interpreting technologies which are currently being developed or have just been completed at the Research Institute of Multilingual Language Technol-ogies (IUITLM), University of Malaga.
Due to the wide-spread development of Machine Translation (MT) systems –especially Neural Machine Translation (NMT) systems– MT evaluation, both automatic and human, has become more and more important as it helps us establish how MT systems perform. Yet, automatic evaluation metrics have lagged behind, as the most popular choices (e.g., BLEU, METEOR and ROUGE) may correlate poorly with human judgments. This paper seeks to put to the test an evaluation model based on a novel deep learning schema (NoDeeLe) used to compare two NMT systems on four different text genres, i.e. medical, legal, marketing and literary in the English-Greek language pair. The model utilizes information from the source segments, the MT outputs and the reference translation, as well as the automatic metrics BLEU, METEOR and WER. The proposed schema achieves a strong correlation with human judgment (78% average accuracy for the four texts with the highest accuracy, i.e. 85%, observed in the case of the marketing text), while it outperforms classic machine learning algorithms and automatic metrics.
Social media companies as well as censorship authorities make extensive use of artificial intelligence (AI) tools to monitor postings of hate speech, celebrations of violence or profanity. Since AI software requires massive volumes of data to train computers, automatic-translation of the online content is usually implemented to compensate for the scarcity of text in some languages. However, machine translation (MT) mistakes are a regular occurrence when translating sentiment-oriented user-generated content (UGC), especially when a low-resource language is involved. In such scenarios, the adequacy of the whole process relies on the assumption that the translation can be evaluated correctly. In this paper, we assess the ability of automatic quality metrics to detect critical machine translation errors which can cause serious misunderstanding of the affect message. We compare the performance of three canonical metrics on meaningless translations as compared to meaningful translations with a critical error that distorts the overall sentiment of the source text. We demonstrate the need for the fine-tuning of automatic metrics to make them more robust in detecting sentiment critical errors.
This work is based on the testing of a remote interpreting (RI) delivery platform conducted a year before the disruptive COVID-19 pandemic outbreak, and aimed at assessing the use and experience of such systems in a university setting. A survey was administered to the different groups of users (interpreters, audience, and speakers) involved in two tests to collect their responses and remarks, and assess trends and perceptions in their experience. According to emerging findings of the research project, RI was already considered to be an indisputable yet burgeoning resource for conference settings with potential convenience and benefits for each group of users. However, participants’ remarks early suggested that all the parties involved in the industry need to collaborate to effectively improve and enhance such services. Specific training on RI modalities would also appear to be increasingly necessary for interpreters to adapt to new raising working conditions and meet a thriving demand—and training institutions would ever more have to offer adequate solutions, while this technological shift also requires receptiveness and adaptability to an abruptly diversifying and evolving profession.
The relationship between stress and performance and Remote Interpreting (RI)/Remote Simultaneous Interpreting (RSI) has been widely studied in academic, professional and corporate research during the past fifty years. Most of such research has attempted to correlate RI/RSI with changes in stress levels and performance, with little to no relevant results to suggest causality. While no significant clinical causality has been found between RI/RSI and stress, self-perceived stress during RI and especially RSI among practicing conference interpreters is consistently high and recent studies suggest a tendency on the increase. Similar results have been observed with performance, which has been and is consistently self-assessed as poorer during RI/RSI by practicing interpreters compared to in-person interpreting, how-ever no significant decrease in performance was observed by independent reviewers. Several scholars have suggested a correlation between such low self-perceived performance / high self-perceived stress and a lack of control which might result from being exposed to unknown factors during RI/RSI, prominently technological elements, the performance of which no longer re-lies on third parties but lies with the interpreters themselves. This paper is centered on the same hypothesis and suggests a proposal for action that interpreters can undertake to help regain control and thus improve their attitude toward RI/RSI.
The domain-specialised application of Named Entity Recognition (NER) is known as Biomedical NER (BioNER), which aims to identify and classify biomedical concepts that are of interest to researchers, such as genes, proteins, chemical compounds, drugs, mutations, diseases, and so on. The BioNER task is very similar to general NER but recognising Biomedical Named Entities (BNEs) is more challenging than recognising proper names from newspapers due to the characteristics of biomedical nomenclature. In order to address the challenges posed by BioNER, seven machine learning models were implemented comparing a transfer learning approach based on fine-tuned BERT with Bi-LSTM based neural models and a CRF model used as baseline. Precision, Recall and F1-score were used as performance scores evaluating the models on two well-known biomedical corpora: JNLPBA and BIOCREATIVE IV (BC-IV). Strict and partial matching were considered as evaluation criteria. The reported results show that a transfer learning approach based on fine-tuned BERT outperforms all others methods achieving the highest scores for all metrics on both corpora.
Named Entity Recognition is an essential task in natural language processing to detect entities and classify them into predetermined categories. An entity is a meaningful word, or phrase that refers to proper nouns. Named Entities play an important role in different NLP tasks such as Information Extraction, Question Answering and Machine Translation. In Machine Translation, named entities often cause translation failures regardless of local context, affecting the output quality of translation. Annotating named entities is a time-consuming and expensive process especially for low-resource languages. One solution for this problem is to use word alignment methods in bilingual parallel corpora in which just one side has been annotated. The goal is to extract named entities in the target language by using the annotated corpus of the source language. In this paper, we compare the performance of two alignment methods, Grow-diag-final-and and Intersect Symmetrisation heuristics, to exploit the annotation projection of English-Brazilian Portuguese bilingual corpus to detect named entities in Brazilian Portuguese. A NER model that is trained on annotated data extracted from the alignment methods, is used to evaluate the performance of aligners. Experimental results show the Intersect Symmetrisation is able to achieve superior performance scores compared to the Grow-diag-final-and heuristic in Brazilian Portuguese.
The aim of this paper is to describe the process carried out to develop a paral-lel corpus comprised of texts extracted from the corporate websites of south-ern Spanish SMEs from the sanitary sector which will serve as the basis for MT quality assessment. The stages for compiling the parallel corpora were: (i) selection of websites with content translated in English and Spanish, (ii) downloading of the HTML files of the selected websites, (iii) files filtering and pairing of English files with their Spanish equivalents, (iv) compilation of individual corpora (EN and ES) for each of the selected websites, (v) merging of the individual corpora into a two general corpus one in English and the other in Spanish, (vi) selection a representative sample of segments to be used as original (ES) and reference translations (EN), (vii) building of the parallel corpus intended for MT evaluation. The parallel corpus generated will serve to future Machine Translation quality assessment. In addition, the monolingual corpora generated during the process could as a base to carry out research focused on linguistic – bilingual or monolingual − analysis.
We present a system to support simultaneous interpreting in specific domains. The system is going to be developed through a strong synergy among technicians, mostly experts on both speech and text processing, and end-users, i.e. professional interpreters who define the requirements and will test the final product. Some preliminary encouraging results have been achieved on benchmark tests collected with the aim of measuring the performance of single components of the whole system, namely: automatic speech recognition (ASR) and named entity recognition.
Communication between healthcare professionals and deaf patients is challenging, and the current COVID-19 pandemic makes this issue even more acute. Sign language interpreters can often not enter hospitals and face masks make lipreading impossible. To address this urgent problem, we developed a system which allows healthcare professionals to translate sentences that are frequently used in the diagnosis and treatment of COVID-19 into Sign Language of the Netherlands (NGT). Translations are displayed by means of videos and avatar animations. The architecture of the system is such that it could be extended to other applications and other sign languages in a relatively straightforward way.
The aim of this paper is to investigate the similarity measurement approach of translation memory (TM) in five representative computer-aided translation (CAT) tools when retrieving inflectional verb-variation sentences in Arabic to English translation. In English, inflectional affixes in verbs include suffixes only; unlike English, verbs in Arabic derive voice, mood, tense, number and person through various inflectional affixes e.g. pre or post a verb root. The research question focuses on establishing whether the TM similarity algorithm measures a combination of the inflectional affixes as a word or as a character intervention when retrieving a segment. If it is dealt with as a character intervention, are the types of intervention penalized equally or differently? This paper experimentally examines, through a black box testing methodology and a test suite instrument, the penalties that TM systems’ current algorithms impose when input segments and retrieved TM sources are exactly the same, except for a difference in an inflectional affix. It would be expected that, if TM systems had some linguistic knowledge, the penalty would be very light, which would be useful to translators, since a high-scoring match would be presented near the top of the list of proposals. However, analysis of TM systems’ output shows that inflectional affixes are penalized more heavily than expected, and in different ways. They may be treated as an intervention on the whole word, or as a single character change.
In recent years, the emergence of streaming platforms such as Netflix, HBO or Amazon Prime Video has reshaped the field of entertainment, which increasingly relies on subtitling, dubbing or voice-over modes. However, little is known about audiovisual translation when dealing with Neural Machine Translation (NMT) engines. This work-in-progress paper seeks to examine the English subtitles of the first episode of the popular Spanish Netflix series Cable Girls and the translated version generated by Google Translate and DeepL. Such analysis will help us determine whether there are significant linguistic differences that could lead to miscomprehension or cultural shocks. To this end, the corpus compiled consists of the Spanish script, the English subtitles available in Netflix and the translated version of the script. For the analysis of the data, errors have been classified following the DQF/MQM Error typology and have been evaluated with the automatic BLEU metric. Results show that NMT engines offer good-quality translations, which in turn may benefit translators working with audiovisual entertainment resources.
The exponential growth of the internet and social media in the past decade gave way to the increase in dissemination of false or misleading information. Since the 2016 US presidential election, the term “fake news” became increasingly popular and this phenomenon has received more attention. In the past years several fact-checking agencies were created, but due to the great number of daily posts on social media, manual checking is insufficient. Currently, there is a pressing need for automatic fake news detection tools, either to assist manual fact-checkers or to operate as standalone tools. There are several projects underway on this topic, but most of them focus on English. This research-in-progress paper discusses the employment of deep learning methods, and the development of a tool, for detecting false news in Portuguese. As a first step we shall compare well-established architectures that were tested in other languages and analyse their performance on our Portuguese data. Based on the preliminary results of these classifiers, we shall choose a deep learning model or combine several deep learning models which hold promise to enhance the performance of our fake news detection system.
Advancements within the field of text simplification (TS) have primarily been within syntactic or lexical simplification. However, conceptual simplification has previously been identified as another field of TS that has the potential to significantly improve reading comprehension. A first step to measuring conceptual simplification is the classification of concepts as either complex or simple. This research-in-progress paper proposes a new definition of conceptual complexity alongside a simple machine-learning approach that performs a binary classification task to distinguish between simple and complex concepts. It is proposed that this be a first step when developing new text simplification models that operate on a conceptual level.
The development of Translation Technologies, like Translation Memory and Machine Translation, has completely changed the translation industry and translator’s workflow in the last decades. Nevertheless, TM and MT have been developed separately until very recently. This ongoing project will study the external integration of TM and MT, examining if the productivity and post-editing efforts of translators are higher or lower than using only TM. To this end, we will conduct an experiment where Translation students and professional translators will be asked to translate two short texts; then we will check the post-editing efforts (temporal, technical and cognitive efforts) and the quality of the translated texts.
Despite the increasingly good quality of Machine Translation (MT) systems, MT outputs require corrections. Automatic Post-Editing (APE) models have been introduced to perform these corrections without human intervention. However, no system has been able to fully automate the Post-Editing (PE) process. Moreover, while numerous translation tools, such as Translation Memories (TMs), largely benefit from translators’ input, Human-Computer Interaction (HCI) remains limited when it comes to PE. This research-in-progress paper discusses APE models and suggests that they could be improved in more interactive scenarios, as previously done in MT with the creation of Interactive MT (IMT) systems. Based on the hypothesis that PE would benefit from HCI, two methodologies are proposed. Both suggest that traditional batch learning settings are not optimal for PE. Instead, online techniques are recommended to train and update PE models on the fly, via either real or simulated interactions with the translator.
A comparison of formulaic sequences in human and neural machine translation of quality newspaper articles shows that neural machine translations contain less lower-frequency, but strongly-associated formulaic sequences (FSs), and more high-frequency FSs. These observations can be related to the differences between second language learners of various levels and between translated and untranslated texts. The comparison between the neural machine translation systems indicates that some systems produce more FSs of both types than other systems.
The MultiTraiNMT Erasmus+ project aims at developing an open innovative syllabus in neural machine translation (NMT) for language learners and translators as multilingual citizens. Machine translation is seen as a resource that can support citizens in their attempt to acquire and develop language skills if they are trained in an informed and critical way. Machine translation could thus help tackle the mismatch between the desired EU aim of having multilingual citizens who speak at least two foreign languages and the current situation in which citizens generally fall far short of this objective. The training materials consists of an open-access coursebook, an open-source NMT web application called MutNMT for training purposes, and corresponding activities.
Language technology is already largely adopted by most Language Service Providers (LSPs) and integrated into their traditional translation processes. In this context, there are many different approaches to applying Post-Editing (PE) of a machine translated text, involving different workflow processes and steps that can be more or less effective and favorable. In the present paper, we propose a 3-step Post-Editing Workflow (PEW). Drawing from industry insight, this paper aims to provide a basic framework for LSPs and Post-Editors on how to streamline Post-Editing workflows in order to improve quality, achieve higher profitability and better return on investment and standardize and facilitate internal processes in terms of management and linguist effort when it comes to PE services. We argue that a comprehensive PEW consists in three essential tasks: Pre-Editing, Post-Editing and Annotation/Machine Translation (MT) evaluation processes (Guerrero, 2018) supported by three essential roles: Pre-Editor, Post-Editor and Annotator (Gene, 2020). Furthermore, the pre-sent paper demonstrates the training challenges arising from this PEW, supported by empirical research results, as reflected in a digital survey among language industry professionals (Gene, 2020), which was conducted in the context of a Post-Editing Webinar. Its sample comprised 51 representatives of LSPs and 12 representatives of SLVs (Single Language Vendors) representatives.
This paper offers a comparative evaluation of four commercial ASR systems which are evaluated according to the post-editing effort required to reach “publishable” quality and according to the number of errors they produce. For the error annotation task, an original error typology for transcription errors is proposed. This study also seeks to examine whether there is a difference in the performance of these systems between native and non-native English speakers. The experimental results suggest that among the four systems, Trint obtains the best scores. It is also observed that most systems perform noticeably better with native speakers and that all systems are most prone to fluency errors.
The Covid pandemic upended translation teaching globally. The forced move to online teaching represented a gargantuan challenge for anyone only experienced in face-to-face teaching. Online translation teaching requires distinct approaches to guarantee that students can reach the targeted learning goals. This paper presents a literature review on the provision of effective feedback in the light of these drastic changes in translation teaching as well as a description as how existing research on online feedback for translation training has been applied to the design of online courses at the translation program at Rutgers University.
Translation Studies and more specifically, its subfield Descriptive Translation Studies [Holmes 1988/2000] is, according to many scholars [Gambier, 2009; Nenopoulou, 2007; Munday, 2001/2008; Hermans, 1999; Snell-Hornby et al., 1994 e.t.c], a highly interdisciplinary field of study. The aim of the present paper is to describe the role of polysemiotic corpora in the study of university website localization from a multidisciplinary perspective. More specifically, the paper gives an overview of an on-going postdoctoral research on the identity formation of Greek university websites on the web, focusing on the methodology adopted with reference to corpora compilation based on methodological tools and concepts from various fields such as Translation Studies, social semiotics, cultural studies, critical discourse analysis and marketing. The objects of comparative analysis are Greek and French original and translated (into English) university websites as well as original British and American university website versions. Up to now research findings have shown that polysemiotic corpora can be a valuable tool not only of quantitative but also of qualitative analysis of website localization both for scholars and translation professionals working with multimodal genres.