Sergi Àlvarez Vidal

Also published as: Sergi Alvarez-Vidal, Sergi Alvarez Vidal

2025

pdf bib abs
Fine-tuning and evaluation of NMT models for literary texts using RomCro v.2.0
Bojana Mikelenić | Antoni Oliver | Sergi Àlvarez Vidal
Proceedings of the Second Workshop on Creative-text Translation and Technology (CTT)

This paper explores the fine-tuning and evaluation of neural machine translation (NMT) models for literary texts using RomCro v.2.0, an expanded multilingual and multidirectional parallel corpus. RomCro v.2.0 is based on RomCro v.1.0, but includes additional literary works, as well as texts in Catalan, making it a valuable resource for improving MT in underrepresented language pairs. Given the challenges of literary translation, where style, narrative voice, and cultural nuances must be preserved, fine-tuning on high-quality domain-specific data is essential for enhancing MT performance. We fine-tune existing NMT models with RomCro v.2.0 and evaluate their performance for six different language combinations using automatic metrics and for Spanish-Croatian and French-Catalan using manual evaluation. Results indicate that fine-tuned models outperform general-purpose systems, achieving greater fluency and stylistic coherence. These findings support the effectiveness of corpus-driven fine-tuning for literary translation and highlight the importance of curated high-quality corpus.

pdf bib abs
Using Translation Techniques to Characterize MT Outputs
Sergi Alvarez-Vidal | Maria Do Campo | Christian Olalla-Soler | Pilar Sánchez-Gijón
Proceedings of Machine Translation Summit XX: Volume 1

While current NMT and GPT models improve fluency and context awareness, they struggle with creative texts, where figurative language and stylistic choices are crucial. Current evaluation methods fail to capture these nuances, which requires a more descriptive approach. We propose a taxonomy based on translation techniques to assess machine-generated translations more comprehensively. The pilot study we conducted comparing human machine-produced translations reveals that human translations employ a wider range of techniques, enhancing naturalness and cultural adaptation. NMT and GPT models, even with prompting, tend to simplify content and introduce accuracy errors. Our findings highlight the need for refined frameworks that consider stylistic and contextual accuracy, ultimately bridging the gap between human and machine translation performance.

2024

pdf bib abs
LitPC: A set of tools for building parallel corporafrom literary works
Antoni Oliver | Sergi Alvarez-Vidal
Proceedings of the 1st Workshop on Creative-text Translation and Technology

In this paper, we describe the LitPC toolkit, a variety of tools and methods designed for the quick and effective creation of parallel corpora derived from literary works. This toolkit can be a useful resource due to the scarcity of curated parallel texts for this domain. We also feature a case study describing the creation of a Russian-English parallel corpus based on the literary works by Leo Tolstoy. Furthermore, an augmented version of this corpus is used to both train and assess neural machine translation systems specifically adapted to the author’s style.

pdf bib abs
Training an NMT system for legal texts of a low-resource language variety South Tyrolean German - Italian
Antoni Oliver | Sergi Alvarez-Vidal | Egon Stemle | Elena Chiocchetti
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)

This paper illustrates the process of training and evaluating NMT systems for a language pair that includes a low-resource language variety.A parallel corpus of legal texts for Italian and South Tyrolean German has been compiled, with South Tyrolean German being the low-resourced language variety. As the size of the compiled corpus is insufficient for the training, we have combined the corpus with several parallel corpora using data weighting at sentence level. We then performed an evaluation of each combination and of two popular commercial systems.

2023

Co-authors

Venues

Fix data