Mikel L. Forcada

Also published as: Mikel Forcada

2024

SmartBiC, an 18-month innovation project funded by the Spanish Government, aims at improving the full process of collecting, filtering and selecting in-domain parallel content to be used for machine translation and language model tuning purposes in industrial settings. Based on state-of-the-art technology in the free/open-source parallel web corpora harvester Bitextor, SmartBic develops a web-based application around it including novel components such as a language- and domain-focused crawler and a domain-specific corpora selector. SmartBic also addresses specific industrial use cases for individual components of the Bitextor pipeline, such as parallel data cleaning. Relevant improvements to the current Bitextor pipeline will be publicly released.

pdf abs
Community-driven machine translation for the Catalan language at Softcatalà
Xavi Ivars-Ribes | Jordi Mas | Marc Riera | Jaume Ortolà | Mikel Forcada | David Cànovas
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 2)

Among the services provided by Softcatalà, a non-profit 25-year-old grassroots organization that localizes software into Catalan and develops software to ease the generation of Catalan content, one of the most used is its machine translation (MT) service, which provides both rule-based MT and neural MT between Catalan and twelve other languages. Development occurs in a community-supported, transparent way by using free/open-source software and open language resources. This paper briefly describes the MT services at Softcatalà: the offered functionalities, the data, and the software used to provide them.

2023

We present the most relevant results of the project MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages in its second year. To date, parallel and monolingual corpora have been produced for seven low-resourced European languages by crawling large amounts of textual data from selected top-level domains of the Internet; both human and automatic evaluation show its usefulness. In addition, several large language models pretrained on MaCoCu data have been published, as well as the code used to collect and curate the data.

2022

pdf abs
MultitraiNMT Erasmus+ project: Machine Translation Training for multilingual citizens (multitrainmt.eu)
Mikel L. Forcada | Pilar Sánchez-Gijón | Dorothy Kenny | Felipe Sánchez-Martínez | Juan Antonio Pérez Ortiz | Riccardo Superbo | Gema Ramírez Sánchez | Olga Torres-Hostench | Caroline Rossi
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

The MultitraiNMT Erasmus+ project has developed an open innovative syl-labus in machine translation, focusing on neural machine translation (NMT) and targeting both language learners and translators. The training materials include an open access coursebook with more than 250 activities and a pedagogical NMT interface called MutNMT that allows users to learn how neural machine translation works. These materials will allow students to develop the technical and ethical skills and competences required to become informed, critical users of machine translation in their own language learn-ing and translation practice. The pro-ject started in July 2019 and it will end in July 2022.

We introduce the project “MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages”, funded by the Connecting Europe Facility, which is aimed at building monolingual and parallel corpora for under-resourced European languages. The approach followed consists of crawling large amounts of textual data from carefully selected top-level domains of the Internet, and then applying a curation and enrichment pipeline. In addition to corpora, the project will release successive versions of the free/open-source web crawling and curation software used.

2021

In the media industry and the focus of global reporting can shift overnight. There is a compelling need to be able to develop new machine translation systems in a short period of time and in order to more efficiently cover quickly developing stories. As part of the EU project GoURMET and which focusses on low-resource machine translation and our media partners selected a surprise language for which a machine translation system had to be built and evaluated in two months(February and March 2021). The language selected was Pashto and an Indo-Iranian language spoken in Afghanistan and Pakistan and India. In this period we completed the full pipeline of development of a neural machine translation system: data crawling and cleaning and aligning and creating test sets and developing and testing models and and delivering them to the user partners. In this paperwe describe rapid data creation and experiments with transfer learning and pretraining for this low-resource language pair. We find that starting from an existing large model pre-trained on 50languages leads to far better BLEU scores than pretraining on one high-resource language pair with a smaller model. We also present human evaluation of our systems and which indicates that the resulting systems perform better than a freely available commercial system when translating from English into Pashto direction and and similarly when translating from Pashto into English.

2020

We report on methods to create the largest publicly available parallel corpora by crawling the web, using open source software. We empirically compare alternative methods and publish benchmark data sets for sentence alignment and sentence pair filtering. We also describe the parallel corpora released and evaluate their quality and their usefulness to create machine translation systems.

pdf abs
A multi-source approach for Breton–French hybrid machine translation
Víctor M. Sánchez-Cartagena | Mikel L. Forcada | Felipe Sánchez-Martínez
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

Corpus-based approaches to machine translation (MT) have difficulties when the amount of parallel corpora to use for training is scarce, especially if the languages involved in the translation are highly inflected. This problem can be addressed from different perspectives, including data augmentation, transfer learning, and the use of additional resources, such as those used in rule-based MT. This paper focuses on the hybridisation of rule-based MT and neural MT for the Breton–French under-resourced language pair in an attempt to study to what extent the rule-based MT resources help improve the translation quality of the neural MT system for this particular under-resourced language pair. We combine both translation approaches in a multi-source neural MT architecture and find out that, even though the rule-based system has a low performance according to automatic evaluation metrics, using it leads to improved translation quality.

pdf abs
An English-Swahili parallel corpus and its use for neural machine translation in the news domain
Felipe Sánchez-Martínez | Víctor M. Sánchez-Cartagena | Juan Antonio Pérez-Ortiz | Mikel L. Forcada | Miquel Esplà-Gomis | Andrew Secker | Susie Coleman | Julie Wall
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

This paper describes our approach to create a neural machine translation system to translate between English and Swahili (both directions) in the news domain, as well as the process we followed to crawl the necessary parallel corpora from the Internet. We report the results of a pilot human evaluation performed by the news media organisations participating in the H2020 EU-funded project GoURMET.

2019

pdf abs
Estimating post-editing effort: a study on human judgements, task-based and reference-based metrics of MT quality
Scarton Scarton | Mikel L. Forcada | Miquel Esplà-Gomis | Lucia Specia
Proceedings of the 16th International Conference on Spoken Language Translation

Devising metrics to assess translation quality has always been at the core of machine translation (MT) research. Traditional automatic reference-based metrics, such as BLEU, have shown correlations with human judgements of adequacy and fluency and have been paramount for the advancement of MT system development. Crowd-sourcing has popularised and enabled the scalability of metrics based on human judgments, such as subjective direct assessments (DA) of adequacy, that are believed to be more reliable than reference-based automatic metrics. Finally, task-based measurements, such as post-editing time, are expected to provide a more de- tailed evaluation of the usefulness of translations for a specific task. Therefore, while DA averages adequacy judgements to obtain an appraisal of (perceived) quality independently of the task, and reference-based automatic metrics try to objectively estimate quality also in a task-independent way, task-based metrics are measurements obtained either during or after performing a specific task. In this paper we argue that, although expensive, task-based measurements are the most reliable when estimating MT quality in a specific task; in our case, this task is post-editing. To that end, we report experiments on a dataset with newly-collected post-editing indicators and show their usefulness when estimating post-editing effort. Our results show that task-based metrics comparing machine-translated and post-edited versions are the best at tracking post-editing effort, as expected. These metrics are followed by DA, and then by metrics comparing the machine-translated version and independent references. We suggest that MT practitioners should be aware of these differences and acknowledge their implications when decid- ing how to evaluate MT for post-editing purposes.

pdf bib
Proceedings of Machine Translation Summit XVII: Research Track
Mikel Forcada | Andy Way | Barry Haddow | Rico Sennrich
Proceedings of Machine Translation Summit XVII: Research Track

pdf bib
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks
Mikel Forcada | Andy Way | John Tinsley | Dimitar Shterionov | Celia Rico | Federico Gaspari
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks

pdf
ParaCrawl: Web-scale parallel corpora for the languages of the EU
Miquel Esplà | Mikel Forcada | Gema Ramírez-Sánchez | Hieu Hoang
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks

2018

pdf bib
Proceedings of the 21st Annual Conference of the European Association for Machine Translation
Juan Antonio Pérez-Ortiz | Felipe Sánchez-Martínez | Miquel Esplà-Gomis | Maja Popović | Celia Rico | André Martins | Joachim Van den Bogaert | Mikel L. Forcada
Proceedings of the 21st Annual Conference of the European Association for Machine Translation

pdf abs
Learning to use machine translation on the Translation Commons Learn portal
Jeannette Stewart | Mikel L. Forcada
Proceedings of the 21st Annual Conference of the European Association for Machine Translation

We describe the Learn portal of Translation Commons (TC), a self-managed community of volunteer translators community aimed at sharing tools, resources and initiatives for the translation community as a whole. Members are encouraged to upload and share their free resources on the platform and to create free courses and tutorials. Specifically there are no educational material on machine translation yet and we invite experts to contribute.

pdf abs
Exploring gap filling as a cheaper alternative to reading comprehension questionnaires when evaluating machine translation for gisting
Mikel L. Forcada | Carolina Scarton | Lucia Specia | Barry Haddow | Alexandra Birch
Proceedings of the Third Conference on Machine Translation: Research Papers

A popular application of machine translation (MT) is gisting: MT is consumed as is to make sense of text in a foreign language. Evaluation of the usefulness of MT for gisting is surprisingly uncommon. The classical method uses reading comprehension questionnaires (RCQ), in which informants are asked to answer professionally-written questions in their language about a foreign text that has been machine-translated into their language. Recently, gap-filling (GF), a form of cloze testing, has been proposed as a cheaper alternative to RCQ. In GF, certain words are removed from reference translations and readers are asked to fill the gaps left using the machine-translated text as a hint. This paper reports, for the first time, a comparative evaluation, using both RCQ and GF, of translations from multiple MT systems for the same foreign texts, and a systematic study on the effect of variables such as gap density, gap-selection strategies, and document context in GF. The main findings of the study are: (a) both RCQ and GF clearly identify MT to be useful; (b) global RCQ and GF rankings for the MT systems are mostly in agreement; (c) GF scores vary very widely across informants, making comparisons among MT systems hard, and (d) unlike RCQ, which is framed around documents, GF evaluation can be framed at the sentence level. These findings support the use of GF as a cheaper alternative to RCQ.

pdf abs
Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering
Philipp Koehn | Huda Khayrallah | Kenneth Heafield | Mikel L. Forcada
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We posed the shared task of assigning sentence-level quality scores for a very noisy corpus of sentence pairs crawled from the web, with the goal of sub-selecting 1% and 10% of high-quality data to be used to train machine translation systems. Seventeen participants from companies, national research labs, and universities participated in this task.

pdf abs
UAlacant machine translation quality estimation at WMT 2018: a simple approach using phrase tables and feed-forward neural networks
Felipe Sánchez-Martínez | Miquel Esplà-Gomis | Mikel L. Forcada
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We describe the Universitat d’Alacant submissions to the word- and sentence-level machine translation (MT) quality estimation (QE) shared task at WMT 2018. Our approach to word-level MT QE builds on previous work to mark the words in the machine-translated sentence as OK or BAD, and is extended to determine if a word or sequence of words need to be inserted in the gap after each word. Our sentence-level submission simply uses the edit operations predicted by the word-level approach to approximate TER. The method presented ranked first in the sub-task of identifying insertions in gaps for three out of the six datasets, and second in the rest of them.

2017

pdf
One-parameter models for sentence-level post-editing effort estimation
Mikel L. Forcada | Miquel Esplà-Gomis | Felipe Sánchez-Martínez | Lucia Specia
Proceedings of Machine Translation Summit XVI: Research Track

pdf
Usefulness of MT output for comprehension — an analysis from the point of view of linguistic intercomprehension
Kenneth Jordan Núñez | Mikel L. Forcada | Esteve Clua
Proceedings of Machine Translation Summit XVI: Research Track

2016

pdf
Apertium: a free/open source platform for machine translation and basic language technology
Mikel L. Forcada | Francis M. Tyers
Proceedings of the 19th Annual Conference of the European Association for Machine Translation: Projects/Products

pdf
Abu-MaTran: automatic building of machine translation
Antonio Toral | Sergio Ortiz Rojas | Mikel Forcada | Nikola Lubesic | Prokopis Prokopidis
Proceedings of the 19th Annual Conference of the European Association for Machine Translation: Projects/Products

pdf abs
Fuzzy-match repair using black-box machine translation systems: what can be expected?
John Ortega | Felipe Sánchez-Martínez | Mikel Forcada
Conferences of the Association for Machine Translation in the Americas: MT Researchers' Track

Computer-aided translation (CAT) tools often use a translation memory (TM) as the key resource to assist translators. A TM contains translation units (TU) which are made up of source and target language segments; translators use the target segments in the TU suggested by the CAT tool by converting them into the desired translation. Proposals from TMs could be made more useful by using techniques such as fuzzy-match repair (FMR) which modify words in the target segment corresponding to mismatches identified in the source segment. Modifications in the target segment are done by translating the mismatched source sub-segments using an external source of bilingual information (SBI) and applying the translations to the corresponding positions in the target segment. Several combinations of translated sub-segments can be applied to the target segment which can produce multiple repair candidates. We provide a formal algorithmic description of a method that is capable of using any SBI to generate all possible fuzzy-match repairs and perform an oracle evaluation on three different language pairs to ascertain the potential of the method to improve translation productivity. Using DGT-TM translation memories and the machine system Apertium as the single source to build repair operators in three different language pairs, we show that the best repaired fuzzy matches are consistently closer to reference translations than either machine-translated segments or unrepaired fuzzy matches.

pdf abs
Ranking suggestions for black-box interactive translation prediction systems with multilayer perceptrons
Daniel Torregrosa | Juan Antonio Pérez-Ortiz | Mikel Forcada
Conferences of the Association for Machine Translation in the Americas: MT Researchers' Track

The objective of interactive translation prediction (ITP), a paradigm of computer-aided translation, is to assist professional translators by offering context-based computer-generated suggestions as they type. While most state-of-the-art ITP systems are tightly coupled to a machine translation (MT) system (often created ad-hoc for this purpose), our proposal follows a resourceagnostic approach, one that does not need access to the inner workings of the bilingual resources (MT systems or any other bilingual resources) used to generate the suggestions, thus allowing to include new resources almost seamlessly. As we do not expect the user to tolerate more than a few proposals each time, the set of potential suggestions need to be filtered and ranked; the resource-agnostic approach has been evaluated before using a set of intuitive length-based and position-based heuristics designed to determine which suggestions to show, achieving promising results. In this paper, we propose a more principled suggestion ranking approach using a regressor (a multilayer perceptron) that achieves significantly better results.

pdf
Bitextor’s participation in WMT’16: shared task on document alignment
Miquel Esplà-Gomis | Mikel Forcada | Sergio Ortiz-Rojas | Jorge Ferrández-Tordera
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf
UAlacant word-level and phrase-level machine translation quality estimation systems at WMT 2016
Miquel Esplà-Gomis | Felipe Sánchez-Martínez | Mikel Forcada
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf
Stand-off Annotation of Web Content as a Legally Safer Alternative to Crawling for Distribution
Mikel L. Forcada | Miquel Esplà-Gomis | Juan Antonio Pérez-Ortiz
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

2015

pdf
Using on-line available sources of bilingual information for word-level machine translation quality estimation
Miquel Esplà-Gomis | Felipe Sánchez-Martínez | Mikel L. Forcada
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf
A general framework for minimizing translation effort: towards a principled combination of translation technologies in computer-aided translation
Mikel L. Forcada | Felipe Sánchez-Martínez
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf
Evaluating machine translation for assimilation via a gap-filling task
Ekaterina Ageeva | Francis M. Tyers | Mikel L. Forcada | Juan Antonio Pérez-Ortiz
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf
Unsupervised training of maximum-entropy models for lexical selection i in rule-based machine translation
Francis M. Tyers | Felipe Sánchez-Martinez | Mikel L. Forcada
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf
UAlacant word-level machine translation quality estimation system at WMT 2015
Miquel Esplà-Gomis | Felipe Sánchez-Martínez | Mikel Forcada
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf
Evaluating machine translation for assimilation via a gap-filling task
Ekaterina Ageeva | Mikel L. Forcada | Francis M. Tyers | Juan Antonio Pérez-Ortiz
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf
Unsupervised training of maximum-entropy models for lexical selection in rule-based machine translation
Francis M. Tyers | Felipe Sánchez-Martínez | Mikel L. Forcada
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2014

pdf abs
On the annotation of TMX translation memories for advanced leveraging in computer-aided translation
Mikel Forcada
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The term advanced leveraging refers to extensions beyond the current usage of translation memory (TM) in computer-aided translation (CAT). One of these extensions is the ability to identify and use matches on the sub-segment level ― for instance, using sub-sentential elements when segments are sentences― to help the translator when a reasonable fuzzy-matched proposal is not available; some such functionalities have started to become available in commercial CAT tools. Resources such as statistical word aligners, external machine translation systems, glossaries and term bases could be used to identify and annotate segment-level translation units at the sub-segment level, but there is currently no single, agreed standard supporting the interchange of sub-segmental annotation of translation memories to create a richer translation resource. This paper discusses the capabilities and limitations of some current standards, envisages possible alternatives, and ends with a tentative proposal which slightly abuses (repurposes) the usage of existing elements in the TMX standard.

pdf
Black-box integration of heterogeneous bilingual resources into an interactive translation system
Juan Antonio Pérez-Ortiz | Daniel Torregrosa | Mikel Forcada
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation

pdf
An efficient method to assist non-expert users in extending dictionaries by assigning stems and inflectional paradigms to unknknown words
Miquel Esplà-Gomis | Víctor M. Sánchez-Cartegna | Felipe Sánchez-Martínez | Rafael C. Carrasco | Mikel L. Forcada | Juan Antonio Pérez-Ortiz
Proceedings of the 17th Annual Conference of the European Association for Machine Translation

pdf abs
Using any machine translation source for fuzzy-match repair in a computer-aided translation setting
John E. Ortega | Felipe Sánchez-Martinez | Mikel L. Forcada
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: MT Researchers Track

When a computer-assisted translation (CAT) tool does not find an exact match for the source segment to translate in its translation memory (TM), translators must use fuzzy matches that come from translation units in the translation memory that do not completely match the source segment. We explore the use of a fuzzy-match repair technique called patching to repair translation proposals from a TM in a CAT environment using any available machine translation system, or any external bilingual source, regardless of its internals. Patching attempts to aid CAT tool users by repairing fuzzy matches and proposing improved translations. Our results show that patching improves the quality of translation proposals and reduces the amount of edit operations to perform, especially when a specific set of restrictions is applied.

2012

pdf
UAlacant: Using Online Machine Translation for Cross-Lingual Textual Entailment
Miquel Esplà-Gomis | Felipe Sánchez-Martínez | Mikel L. Forcada
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf
Flexible finite-state lexical selection for rule-based machine translation
Francis M. Tyers | Felipe Sánchez-Martínez | Mikel L. Forcada
Proceedings of the 16th Annual Conference of the European Association for Machine Translation

2011

pdf bib
Proceedings of the 15th Annual Conference of the European Association for Machine Translation
Mikel L. Forcada | Heidi Depraetere | Vincent Vandeghinste
Proceedings of the 15th Annual Conference of the European Association for Machine Translation

pdf
Using word alignments to assist computer-aided translation users by marking which target-side words to change or keep unedited
Miquel Esplà | Felipe Sánchez-Martínez | Mikel L. Forcada
Proceedings of the 15th Annual Conference of the European Association for Machine Translation

pdf
Using Example-Based MT to Support Statistical MT when Translating Homogeneous Data in a Resource-Poor Setting
Sandipan Dandapat | Sara Morrissey | Andy Way | Mikel L. Forcada
Proceedings of the 15th Annual Conference of the European Association for Machine Translation

pdf
Using machine translation in computer-aided translation to suggest the target-side words to change
Miquel Esplà-Gomis | Felipe Sánchez-Martínez | Mikel L. Forcada
Proceedings of Machine Translation Summit XIII: Papers

2010

2009

pdf abs
The Apertium machine translation platform: Five years on
Mikel L. Forcada | Francis M. Tyers | Gema Ramírez-Sánchez
Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation

This paper describes Apertium: a free/open-source machine translation platform (engine, toolbox and data), its history, its philosophy of design, its technology, the community of developers, the research and business based on it, and its prospects and challenges, now that it is five years old.

2007

pdf
Automatic induction of shallow-transfer rules for open-source machine translation
Felipe Sánchez-Martínez | Mikel L. Forcada
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

2005

We present the current status of development of an open architecture for the translation from Spanish into Basque. The machine translation architecture uses an open source analyser for Spanish and new modules mainly based on finite-state transducers. The project is integrated in the OpenTrad initiative, a larger government funded project shared among different universities and small companies, which will also include MT engines for translation among the main languages in Spain. The main objective is the construction of an open, reusable and interoperable framework. This paper describes the design of the engine, the formats it uses for the communication among the modules, the modules reused from other project named Matxin and the new modules we are building.

By the time Machine Translation Summit X is held in September 2005, our group will have released an open-source machine translation toolbox as part of a large government-funded project involving four universities and three linguistic technology companies from Spain. The machine translation toolbox, which will most likely be released under a GPL-like license includes (a) the open-source engine itself, a modular shallow-transfer machine translation engine suitable for related languages and largely based upon that of systems we have already developed, such as interNOSTRUM for Spanish—Catalan and Traductor Universia for Spanish—Portuguese, (b) extensive documentation (including document type declarations) specifying the XML format of all linguistic (dictionaries, rules) and document format management files, (c) compilers converting these data into the high-speed (tens of thousands of words a second) format used by the engine, and (d) pilot linguistic data for Spanish—Catalan and Spanish—Galician and format management specifications for the HTML, RTF and plain text formats. After describing very briefly this toolbox, this paper aims at exploring possible consequences of the availability of this architecture, including the community-driven development of machine translation systems for languages lacking this kind of linguistic technology.

pdf
LIHLA: Shared Task System Description
Helena M. Caseli | Maria G. V. Nunes | Mikel L. Forcada
Proceedings of the ACL Workshop on Building and Using Parallel Texts

2004

pdf
Cooperative unsupervised training of the part-of-speech taggers in a bidirectional machine translation system
Felipe Sánchez-Martínez | Juan Antonio Pérez-Ortiz | Mikel L. Forcada
Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

2003

pdf bib abs
A 45-hour computers in translation course
Mikel L. Forcada
Workshop on Teaching Translation Technologies and Tools

This paper describes how a 45-hour Computers in Translation course is actually taught to 3rd-year translation students at the University of Alacant; the course described started in year 1995–1996 and has undergone substantial redesign until its present form. It is hoped that this description may be of use to instructors who are forced to teach a similar subject in such as small slot of time and need some design guidelines.

2002

pdf
Explaining real MT to translators: between compositional semantics and word-for-word
Mikel L. Forcada
Proceedings of the 6th EAMT Workshop: Teaching Machine Translation

pdf
Incremental construction and maintenance of morphological analysers based on augmented letter transducers
Alicia Garrido-Alenda | Mikel L. Forcada | Rafael C. Carrasco
Proceedings of the 9th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

pdf bib
Using multilingual content on the web to build fast finite-state direct translation systems
Mikel L. Forcada
Workshop on machine translation roadmap

pdf
Incremental Construction and Maintenance of Minimal Finite-State Automata
Rafael C. Carrasco | Mikel L. Forcada
Computational Linguistics, Volume 28, Number 2, June 2002

2001

pdf abs
Discovering machine translation strategies beyond word-for-word translation: a laboratory assignment
Juan Antonio Pérez-Ortiz | Mikel L. Forcada
Workshop on Teaching Machine Translation

It is a common mispreconception to say that machine translation programs translate word-for-word, but real systems follow strategies which are much more complex. This paper proposes a laboratory assignment to study the way in which some commercial machine translation programs translate whole sentences and how the translation differs from a word-for-word translation. Students are expected to infer some of these extra strategies by observing the outcome of real systems when translating a set of sentences designed on purpose. The assignment also makes students aware of the difficulty of constructing such programs while bringing some technological light into the apparent “magic” of machine translation.