Matīss Rikters

2022

pdf abs
Machine Translation for Livonian: Catering to 20 Speakers
Matīss Rikters | Marili Tomingas | Tuuli Tuisk | Valts Ernštreits | Mark Fishel
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Livonian is one of the most endangered languages in Europe with just a tiny handful of speakers and virtually no publicly available corpora. In this paper we tackle the task of developing neural machine translation (NMT) between Livonian and English, with a two-fold aim: on one hand, preserving the language and on the other – enabling access to Livonian folklore, lifestories and other textual intangible heritage as well as making it easier to create further parallel corpora. We rely on Livonian’s linguistic similarity to Estonian and Latvian and collect parallel and monolingual data for the four languages for translation experiments. We combine different low-resource NMT techniques like zero-shot translation, cross-lingual transfer and synthetic data creation to reach the highest possible translation quality as well as to find which base languages are empirically more helpful for transfer to Livonian. The resulting NMT systems and the collected monolingual and parallel data, including a manually translated and verified translation benchmark, are publicly released via OPUS and Huggingface repositories.

2020

pdf abs
Document-aligned Japanese-English Conversation Parallel Corpus
Matīss Rikters | Ryokan Ri | Tong Li | Toshiaki Nakazawa
Proceedings of the Fifth Conference on Machine Translation

Sentence-level (SL) machine translation (MT) has reached acceptable quality for many high-resourced languages, but not document-level (DL) MT, which is difficult to 1) train with little amount of DL data; and 2) evaluate, as the main methods and data sets focus on SL evaluation. To address the first issue, we present a document-aligned Japanese-English conversation corpus, including balanced, high-quality business conversation data for tuning and testing. As for the second issue, we manually identify the main areas where SL MT fails to produce adequate translations in lack of context. We then create an evaluation set where these phenomena are annotated to alleviate automatic evaluation of DL systems. We train MT models using our corpus to demonstrate how using context leads to improvements.

pdf
Customized Neural Machine Translation Systems for the Swiss Legal Domain
Rubén Martínez-Domínguez | Matīss Rikters | Artūrs Vasiļevskis | Mārcis Pinnis | Paula Reichenberg
Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)

pdf abs
The University of Tokyo’s Submissions to the WAT 2020 Shared Task
Matīss Rikters | Toshiaki Nakazawa | Ryokan Ri
Proceedings of the 7th Workshop on Asian Translation

The paper describes the development process of the The University of Tokyo’s NMT systems that were submitted to the WAT 2020 Document-level Business Scene Dialogue Translation sub-task. We describe the data processing workflow, NMT system training architectures, and automatic evaluation results. For the WAT 2020 shared task, we submitted 12 systems (both constrained and unconstrained) for English-Japanese and Japanese-English translation directions. The submitted systems were trained using Transformer models and one was a SMT baseline.

2019

pdf abs
Designing the Business Conversation Corpus
Matīss Rikters | Ryokan Ri | Tong Li | Toshiaki Nakazawa
Proceedings of the 6th Workshop on Asian Translation

While the progress of machine translation of written text has come far in the past several years thanks to the increasing availability of parallel corpora and corpora-based training technologies, automatic translation of spoken text and dialogues remains challenging even for modern systems. In this paper, we aim to boost the machine translation quality of conversational texts by introducing a newly constructed Japanese-English business conversation parallel corpus. A detailed analysis of the corpus is provided along with challenging examples for automatic translation. We also experiment with adding the corpus in a machine translation training scenario and show how the resulting system benefits from its use.

pdf abs
Tilde’s Machine Translation Systems for WMT 2019
Marcis Pinnis | Rihards Krišlauks | Matīss Rikters
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

The paper describes the development process of Tilde’s NMT systems for the WMT 2019 shared task on news translation. We trained systems for the English-Lithuanian and Lithuanian-English translation directions in constrained and unconstrained tracks. We build upon the best methods of the previous year’s competition and combine them with recent advancements in the field. We also present a new method to ensure source domain adherence in back-translated data. Our systems achieved a shared first place in human evaluation.

2018

pdf abs
Tilde’s Machine Translation Systems for WMT 2018
Mārcis Pinnis | Matīss Rikters | Rihards Krišlauks
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

The paper describes the development process of the Tilde’s NMT systems that were submitted for the WMT 2018 shared task on news translation. We describe the data filtering and pre-processing workflows, the NMT system training architectures, and automatic evaluation results. For the WMT 2018 shared task, we submitted seven systems (both constrained and unconstrained) for English-Estonian and Estonian-English translation directions. The submitted systems were trained using Transformer models.

pdf
Training and Adapting Multilingual NMT for Less-resourced and Morphologically Rich Languages
Matīss Rikters | Mārcis Pinnis | Rihards Krišlauks
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf
Paying Attention to Multi-Word Expressions in Neural Machine Translation
Matīss Rikters | Ondřej Bojar
Proceedings of Machine Translation Summit XVI: Research Track

pdf
Confidence through Attention
Matīss Rikters | Mark Fishel
Proceedings of Machine Translation Summit XVI: Research Track

pdf
C-3MA: Tartu-Riga-Zurich Translation Systems for WMT17
Matīss Rikters | Chantal Amrhein | Maksym Del | Mark Fishel
Proceedings of the Second Conference on Machine Translation

2016

pdf abs
Syntax-based Multi-system Machine Translation
Matīss Rikters | Inguna Skadiņa
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes a hybrid machine translation system that explores a parser to acquire syntactic chunks of a source sentence, translates the chunks with multiple online machine translation (MT) system application program interfaces (APIs) and creates output by combining translated chunks to obtain the best possible translation. The selection of the best translation hypothesis is performed by calculating the perplexity for each translated chunk. The goal of this approach is to enhance the baseline multi-system hybrid translation (MHyT) system that uses only a language model to select best translation from translations obtained with different APIs and to improve overall English ― Latvian machine translation quality over each of the individual MT APIs. The presented syntax-based multi-system translation (SyMHyT) system demonstrates an improvement in terms of BLEU and NIST scores compared to the baseline system. Improvements reach from 1.74 up to 2.54 BLEU points.

pdf bib abs
Neural Network Language Models for Candidate Scoring in Hybrid Multi-System Machine Translation
Matīss Rikters
Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra6)

This paper presents the comparison of how using different neural network based language modeling tools for selecting the best candidate fragments affects the final output translation quality in a hybrid multi-system machine translation setup. Experiments were conducted by comparing perplexity and BLEU scores on common test cases using the same training data set. A 12-gram statistical language model was selected as a baseline to oppose three neural network based models of different characteristics. The models were integrated in a hybrid system that depends on the perplexity score of a sentence fragment to produce the best fitting translations. The results show a correlation between language model perplexity and BLEU scores as well as overall improvements in BLEU.