Alberto Díaz

Also published as: Alberto Diaz

2024

Automated Extraction of Prosodic Structure from Unannotated Sign Language Video
Antonio F. G. Sevilla | José María Lahoz-Bengoechea | Alberto Diaz
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

As in oral phonology, prosody is an important carrier of linguistic information in sign languages. One of the most prominent ways this reveals itself is in the time structure of signs: their rhythm and intensity of articulation. To be able to empirically see these effects, the velocity of the hands can be computed throughout the execution of a sign. In this article, we propose a method for extracting this information from unlabeled videos of sign language, exploiting CoTracker, a recent advancement in computer vision which can track every point in a video without the need of any calibration or fine-tuning. The dominant hand is identified via clustering of the computed point velocities, and its dynamic profile plotted to make apparent the prosodic structure of signing. We apply our method to different datasets and sign languages, and perform a preliminary visual exploration of results. This exploration supports the usefulness of our methodology for linguistic analysis, though issues to be tackled remain, such as bi-manual signs and a formal and numerical evaluation of accuracy. Nonetheless, the absence of any preprocessing requirements may make it useful for other researchers and datasets.

2016

pdf bib abs

Improving Information Extraction from Wikipedia Texts using Basic English
Teresa Rodríguez-Ferreira | Adrián Rabadán | Raquel Hervás | Alberto Díaz
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The aim of this paper is to study the effect that the use of Basic English versus common English has on information extraction from online resources. The amount of online information available to the public grows exponentially, and is potentially an excellent resource for information extraction. The problem is that this information often comes in an unstructured format, such as plain text. In order to retrieve knowledge from this type of text, it must first be analysed to find the relevant details, and the nature of the language used can greatly impact the quality of the extracted information. In this paper, we compare triplets that represent definitions or properties of concepts obtained from three online collaborative resources (English Wikipedia, Simple English Wikipedia and Simple English Wiktionary) and study the differences in the results when Basic English is used instead of common English. The results show that resources written in Basic English produce less quantity of triplets, but with higher quality.

This paper presents the process of development and the characteristics of an evaluation collection for a personalisation system for digital newspapers. This system selects, adapts and presents contents according to a user model that define information needs. The collection presented here contains data that are cross-related over four different axes: a set of news items from an electronic newspaper, collected into subsets corresponding to a particular sequence of days, packaged together and cross-indexed with a set of user profiles that represent the particular evolution of interests of a set of real users over the given days, expressed in each case according to four different representation frameworks: newspaper sections, Yahoo categories, keywords, and relevance feedback over the set of news items for the previous day. This information provides a minimum starting material over which one can evaluate for a given system how it addresses the first two observations - adapting to different users and adapting to particular users over time - providing that the particular system implements the representation of information needs according to the four frameworks employed in the collection. This collection has been successfully used to perform some different experiments to determine the effectiveness of the personalization system presented.

pdf bib

Improving Summarization of Biomedical Documents Using Word Sense Disambiguation
Laura Plaza | Mark Stevenson | Alberto Díaz
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing