Albert Llorens


2022


Automatic Post-Editing of MT Output Using Large Language Models
Blanca Vidal | Albert Llorens | Juan Alonso
Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track)

This presentation will show two experiments conducted to evaluate the adequacy of OpenAI’s GPT-3 (as a representative of Large Language Models), for the purposes of post-editing and translating texts from English into Spanish, using a glossary of terms to ensure term consistency. The experiments are motivated by a use case in ULG MT Production, where we need to improve the usage of terminology glossaries in our NMT system. The purpose of the experiments is to take advantage of GPT-3 outstanding capabilities to generate text for completion and editing. We have used the edits end-point to post-edit the output of a NMT system using a glossary, and the completions end-point to translate the source text, including the glossary term list in the corresponding GPT-3 prompt. While the results are promising, they also show that there is room for improvement by fine-tuning the models, working on prompt engineering, and adjusting the requests parameters.

2001

pdf
Collapsing morphological information in lexical databases for NLP applications
Juan A. Alonso | Ramón Fanlo | Albert Llorens
Proceedings of Machine Translation Summit VIII

The morphology of inflectional languages poses specific problems in the processing of morphological alternations. Regular alternations at morpheme boundaries can be elegantly captured by the use of rule formalisms based on the two-level morphology model. Stem alternations and completely irregular alternations at morpheme boundaries, however, need to be captured in some way in the lexicon. This paper presents four possible solutions to the problem and makes a claim in favor of one of them. The proposed approach makes use of feature bundles that contain the necessary linguistic information to uniquely identify allomorphic variations of stems in the lexicon. The proposal is an improvement in that it simplifies the representation of allomorphic variations in the lexicon by avoiding duplication of stem allomorphs to capture cross-combination of several morphosyntactic features in stem+flex sequences.