Jusèp Loís Sans Socasau

2024

In this paper, we describe the process of creating the FLORES+ datasets for several Romance languages spoken in Spain, namely Aragonese, Aranese, Asturian, and Valencian. The Aragonese and Aranese datasets are entirely new additions to the FLORES+ multilingual benchmark. An initial version of the Asturian dataset was already available in FLORES+, and our work focused on a thorough revision. Similarly, FLORES+ included a Catalan dataset, which we adapted to the Valencian variety spoken in the Valencian Community. The development of the Aragonese, Aranese, and revised Asturian FLORES+ datasets was undertaken as part of a WMT24 shared task on translation into low-resource languages of Spain.

Co-authors

Alejandro Pardos 1

Juan Antonio Pérez-Ortiz 1

Víctor M. Sánchez-Cartagena 1

Felipe Sánchez-Martínez 1

Cristina Valdés 1

Venues

wmt1
ws1

Fix data