David Pérez-Fernández

Also published as: David Perez Fernandez, David Pérez Fernández

2025

pdf bib abs
CASE: Large Scale Topic Exploitation for Decision Support Systems
Lorena Calvo Bartolomé | Jerónimo Arenas-García | David Pérez Fernández
Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations

In recent years, there has been growing interest in using NLP tools for decision support systems, particularly in Science, Technology, and Innovation (STI). Among these, topic modeling has been widely used for analyzing large document collections, such as scientific articles, research projects, or patents, yet its integration into decision-making systems remains limited. This paper introduces CASE, a tool for exploiting topic information for semantic analysis of large corpora. The core of CASE is a Solr engine with a customized indexing strategy to represent information from Bayesian and Neural topic models that allow efficient topic-enriched searches. Through ad-hoc plug-ins, CASE enables topic inference on new texts and semantic search. We demonstrate the versatility and scalability of CASE through two use cases: the calculation of aggregated STI indicators and the implementation of a web service to help evaluate research projects.

pdf bib abs
TrustBoost: Balancing flexibility and compliance in conversational AI systems
David Griol | Zoraida Callejas | Manuel Gil-Martín | Ksenia Kharitonova | Juan Manuel Montero-Martínez | David Pérez Fernández | Fernando Fernández-Martínez
Proceedings of the 15th International Workshop on Spoken Dialogue Systems Technology

Conversational AI (ConvAI) systems are gaining growing importance as an alternative for more natural interaction with digital services. In this context, Large Language Models (LLMs) have opened new possibilities for less restricted interaction and richer natural language understanding. However, despite their advanced capabilities, LLMs can pose accuracy and reliability problems, as they sometimes generate factually incorrect or contextually inappropriate content that does not fulfill the regulations or business rules of a specific application domain. In addition, they still do not possess the capability to adjust to users’ needs and preferences, showing emotional awareness, while concurrently adhering to the regulations and limitations of their designated domain. In this paper we present the TrustBoost project, which addresses the challenge of improving trustworthiness of ConvAI from two dimensions: cognition (adaptability, flexibility, compliance, and performance) and affectivity (familiarity, emotional dimension, and perception). The duration of the project is from September 2024 to December 2027.

2020

pdf bib
Proceedings of the 1st Workshop on Language Technologies for Government and Public Administration (LT4Gov)
Doaa Samy | David Pérez-Fernández | Jerónimo Arenas-García
Proceedings of the 1st Workshop on Language Technologies for Government and Public Administration (LT4Gov)

pdf bib abs
Legal-ES: A Set of Large Scale Resources for Spanish Legal Text Processing
Doaa Samy | Jerónimo Arenas-García | David Pérez-Fernández
Proceedings of the 1st Workshop on Language Technologies for Government and Public Administration (LT4Gov)

Legal-ES is an open source resource kit for legal Spanish. It consists of a large scale Spanish corpus of open legal texts and different kinds of language models including word embeddings and topic models. The corpus includes over 1000 million words covering a collection of legislative and administrative open access documents in Spanish from different sources representing international, national and regional entities. The corpus is pre-processed and tokenized using Spacy. For the word embeddings, gensim was used on the collection of tokens, producing a representation space that is especially suited to reflect the inherent characteristics of the legal domain. We calculate also topic models to obtain a convenient tool to understand the main topics in the corpus and to navigate through the documents exploiting the semantic similarity among documents. We will analyse the time structure of a dynamic topic model to infer changes in the legal production of Spanish jurisdiction that have occurred over the analysed time framework.

2018

We describe the European Language Resources Infrastructure project, whose main aim is the provision of an infrastructure to help collect, prepare and share language resources that can in turn improve translation services in Europe.

Venues

Fix data