Asier Gutiérrez-Fandiño
2025
Cost-Effective E-Commerce Catalog Translation at Scale Ensuring Named Entity Protection
Asier Gutiérrez-Fandiño
|
Jorge Yero Salazar
|
Clement Ruin
|
Alejandro Quintero-Roba
|
Shangeetha Ravichandran
|
Jesus Perez-Martin
|
Pankaj Adsul
|
Suruchi Garg
|
Leonardo Lezcano
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
We present an enterprise-grade translation platform for global e-commerce that combines daily batch and real-time API pipelines with optimized T5-based models and a Reference Generator to enforce >99% non-translatable entity preservation. A linguist-driven rule engine and explainable evaluation framework (BLEU, COMET, and a custom e-commerce metric) enable continuous quality improvements. Deployed on GPU-accelerated inference servers and CPU-based processing nodes, our system processes millions of listings per day with sub-second latency and achieves 10×–100× cost savings over general-purpose LLMs for English→Spanish and English→French translation, all while version-tracking every update for robust enterprise rollouts.
2022
Pretrained Biomedical Language Models for Clinical NLP in Spanish
Casimiro Pio Carrino
|
Joan Llop
|
Marc Pàmies
|
Asier Gutiérrez-Fandiño
|
Jordi Armengol-Estapé
|
Joaquín Silveira-Ocampo
|
Alfonso Valencia
|
Aitor Gonzalez-Agirre
|
Marta Villegas
Proceedings of the 21st Workshop on Biomedical Language Processing
This work presents the first large-scale biomedical Spanish language models trained from scratch, using large biomedical corpora consisting of a total of 1.1B tokens and an EHR corpus of 95M tokens. We compared them against general-domain and other domain-specific models for Spanish on three clinical NER tasks. As main results, our models are superior across the NER tasks, rendering them more convenient for clinical NLP applications. Furthermore, our findings indicate that when enough data is available, pre-training from scratch is better than continual pre-training when tested on clinical tasks, raising an exciting research question about which approach is optimal. Our models and fine-tuning scripts are publicly available at HuggingFace and GitHub.