Jose Cols


2025

pdf bib
SUWMIT at BioLaySumm2025: Instruction-based Summarization with Contrastive Decoding
Priyam Basu | Jose Cols | Daniel Jarvis | Yongsin Park | Daniel Rodabaugh
BioNLP 2025 Shared Tasks

In the following paper, we present our team’s approach to subtask 1.1 of the BioLaySumm 2025 shared task, which entails the automated generation of lay summaries from biomedical articles. To this end, we experiment with a variety of methods for text preprocessing, extractive summarization, model fine-tuning, and abstractive summarization. Our final results are generated on a fine-tuned Llama 3.1 Instruct (8B) model, notably achieving top scores on two out of four relevance metrics, as well as the highest overall ranking among this year’s participating teams on the plain lay summarization subtask.

2024

pdf bib
Spanish Corpus and Provenance with Computer-Aided Translation for the WMT24 OLDI Shared Task
Jose Cols
Proceedings of the Ninth Conference on Machine Translation

This paper presents the Seed-CAT submission to the WMT24 Open Language Data Initiative shared task. We detail our data collection method, which involves a computer-aided translation tool developed explicitly for translating Seed corpora. We release a professionally translated Spanish corpus and a provenance dataset documenting the translation process. The quality of the data was validated on the FLORES+ benchmark with English-Spanish neural machine translation models, achieving an average chrF++ score of 34.9.