Iván Cantador


2023

pdf
Dimensionality Reduction for Machine Learning-based Argument Mining
Andrés Segura-Tinoco | Iván Cantador
Proceedings of the 10th Workshop on Argument Mining

Recent approaches to argument mining have focused on training machine learning algorithms from annotated text corpora, utilizing as input high-dimensional linguistic feature vectors. Differently to previous work, in this paper, we preliminarily investigate the potential benefits of reducing the dimensionality of the input data. Through an empirical study, testing SVD, PCA and LDA techniques on a new argumentative corpus in Spanish for an underexplored domain (e-participation), and using a novel, rich argument model, we show positive results in terms of both computation efficiency and argumentative information extraction effectiveness, for the three major argument mining tasks: argumentative fragment detection, argument component classification, and argumentative relation recognition. On a space with dimension around 3-4% of the number of input features, the argument mining methods are able to reach 95-97% of the performance achieved by using the entire corpus, and even surpass it in some cases.

2020

pdf
Exploiting Citation Knowledge in Personalised Recommendation of Recent Scientific Publications
Anita Khadka | Iván Cantador | Miriam Fernandez
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper we address the problem of providing personalised recommendations of recent scientific publications to a particular user, and explore the use of citation knowledge to do so. For this purpose, we have generated a novel dataset that captures authors’ publication history and is enriched with different forms of paper citation knowledge, namely citation graphs, citation positions, citation contexts, and citation types. Through a number of empirical experiments on such dataset, we show that the exploitation of the extracted knowledge, particularly the type of citation, is a promising approach for recommending recently published papers that may not be cited yet. The dataset, which we make publicly available, also represents a valuable resource for further investigation on academic information retrieval and filtering.