Lluís-F. Hurtado

Also published as: LLuís-F. Hurtado, Lluis F. Hurtado, Lluís F. Hurtado

2024

pdf abs
ELiRF-VRAIN at BioLaySumm: Boosting Lay Summarization Systems Performance with Ranking Models
Vicent Ahuir | Diego Torres | Encarna Segarra | Lluís-F. Hurtado
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing

This paper presents our contribution to the BioLaySumm 2024 shared task of the 23rd BioNLP Workshop. The task is to create a lay summary, given a biomedical research article and its technical summary. As the input to the system could be large, a Longformer Encoder-Decoder (LED) has been used. We continuously pre-trained a general domain LED model with biomedical data to adapt it to this specific domain. In the pre-training phase, several pre-training tasks were aggregated to inject linguistic knowledge and increase the abstractivity of the generated summaries. Since the distribution of samples between the two datasets, eLife and PLOS, is unbalanced, we fine-tuned two models: one for eLife and another for PLOS. To increase the quality of the lay summaries of the system, we developed a regression model that helps us rank the summaries generated by the summarization models. This regression model predicts the quality of the summary in three different aspects: Relevance, Readability, and Factuality. We present the results of our models and a study to measure the ranking capabilities of the regression model.

2022

pdf abs
DACSA: A large-scale Dataset for Automatic summarization of Catalan and Spanish newspaper Articles
Encarnación Segarra Soriano | Vicent Ahuir | Lluís-F. Hurtado | José González
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The application of supervised methods to automatic summarization requires the availability of adequate corpora consisting of a set of document-summary pairs. As in most Natural Language Processing tasks, the great majority of available datasets for summarization are in English, making it difficult to develop automatic summarization models for other languages. Although Spanish is gradually forming part of some recent summarization corpora, it is not the same for minority languages such as Catalan. In this work, we describe the construction of a corpus of Catalan and Spanish newspapers, the Dataset for Automatic summarization of Catalan and Spanish newspaper Articles (DACSA) corpus. It is a high-quality large-scale corpus that can be used to train summarization models for Catalan and Spanish.We have carried out an analysis of the corpus, both in terms of the style of the summaries and the difficulty of the summarization task. In particular, we have used a set of well-known metrics in the summarization field in order to characterize the corpus. Additionally, for benchmarking purposes, we have evaluated the performances of some extractive and abstractive summarization systems on the DACSA corpus.

2019

pdf abs
ELiRF-UPV at SemEval-2019 Task 3: Snapshot Ensemble of Hierarchical Convolutional Neural Networks for Contextual Emotion Detection
José-Ángel González | Lluís-F. Hurtado | Ferran Pla
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes the approach developed by the ELiRF-UPV team at SemEval 2019 Task 3: Contextual Emotion Detection in Text. We have developed a Snapshot Ensemble of 1D Hierarchical Convolutional Neural Networks to extract features from 3-turn conversations in order to perform contextual emotion detection in text. This Snapshot Ensemble is obtained by averaging the models selected by a Genetic Algorithm that optimizes the evaluation measure. The proposed ensemble obtains better results than a single model and it obtains competitive and promising results on Contextual Emotion Detection in Text.

2018

pdf abs
ELiRF-UPV at SemEval-2018 Tasks 1 and 3: Affect and Irony Detection in Tweets
José-Ángel González | Lluís-F. Hurtado | Ferran Pla
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes the participation of ELiRF-UPV team at tasks 1 and 3 of Semeval-2018. We present a deep learning based system that assembles Convolutional Neural Networks and Long Short-Term Memory neural networks. This system has been used with slight modifications for the two tasks addressed both for English and Spanish. Finally, the results obtained in the competition are reported and discussed.

pdf abs
ELiRF-UPV at SemEval-2018 Task 10: Capturing Discriminative Attributes with Knowledge Graphs and Wikipedia
José-Ángel González | Lluís-F. Hurtado | Encarna Segarra | Ferran Pla
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes the participation of ELiRF-UPV team at task 10, Capturing Discriminative Attributes, of SemEval-2018. Our best approach consists of using ConceptNet, Wikipedia and NumberBatch embeddings in order to stablish relationships between concepts and attributes. Furthermore, this system achieves competitive results in the official evaluation.

pdf abs
ELiRF-UPV at SemEval-2018 Task 11: Machine Comprehension using Commonsense Knowledge
José-Ángel González | Lluís-F. Hurtado | Encarna Segarra | Ferran Pla
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes the participation of ELiRF-UPV team at task 11, Machine Comprehension using Commonsense Knowledge, of SemEval-2018. Our approach is based on the use of word embeddings, NumberBatch Embeddings, and a Deep Learning architecture to find the best answer for the multiple-choice questions based on the narrative text. The results obtained are in line with those obtained by the other participants and they encourage us to continue working on this problem.

2017

pdf abs
ELiRF-UPV at SemEval-2017 Task 7: Pun Detection and Interpretation
Lluís-F. Hurtado | Encarna Segarra | Ferran Pla | Pascual Carrasco | José-Ángel González
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes the participation of ELiRF-UPV team at task 7 (subtask 2: homographic pun detection and subtask 3: homographic pun interpretation) of SemEval2017. Our approach is based on the use of word embeddings to find related words in a sentence and a version of the Lesk algorithm to establish relationships between synsets. The results obtained are in line with those obtained by the other participants and they encourage us to continue working on this problem.

pdf abs
ELiRF-UPV at SemEval-2017 Task 4: Sentiment Analysis using Deep Learning
José-Ángel González | Ferran Pla | Lluís-F. Hurtado
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes the participation of ELiRF-UPV team at task 4 of SemEval2017. Our approach is based on the use of convolutional and recurrent neural networks and the combination of general and specific word embeddings with polarity lexicons. We participated in all of the proposed subtasks both for English and Arabic languages using the same system with small variations.

In this paper, we present the acquisition and labeling processes of the EDECAN-SPORTS corpus, which is a corpus that is oriented to the development of multimodal dialog systems acquired in Spanish and Catalan. Two Wizards of Oz were used in order to better simulate the behavior of an actual system in terms of both the information used by the different modules and the communication mechanisms between these modules. User and system dialog-act labeling, as well as other information, have been obtained automatically using this acquisition method Some preliminary experimental results with the acquired corpus show the appropriateness of the proposed acquisition method for the development of dialog systems

2008

pdf abs
Acquisition and Evaluation of a Dialog Corpus through WOz and Dialog Simulation Techniques
David Griol | Lluís F. Hurtado | Encarna Segarra | Emilio Sanchis
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper, we present a comparison between two corpora acquired by means of two different techniques. The first corpus was acquired by means of the Wizard of Oz technique. A dialog simulation technique has been developed for the acquisition of the second corpus. A random selection of the user and system turns has been used, defining stop conditions for automatically deciding if the simulated dialog is successful or not. We use several evaluation measures proposed in previous research to compare between our two acquired corpora, and then discuss the similarities and differences between the two corpora with regard to these measures.