Aiala Rosá

2020

pdf abs
HAHA 2019 Dataset: A Corpus for Humor Analysis in Spanish
Luis Chiruzzo | Santiago Castro | Aiala Rosá
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper presents the development of a corpus of 30,000 Spanish tweets that were crowd-annotated with humor value and funniness score. The corpus contains approximately 38.6% of humorous tweets with an average score of 2.04 in a scale from 1 to 5 for the humorous tweets. The corpus has been used in an automatic humor recognition and analysis competition, obtaining encouraging results from the participants.

2018

pdf abs
A Crowd-Annotated Spanish Corpus for Humor Analysis
Santiago Castro | Luis Chiruzzo | Aiala Rosá | Diego Garat | Guillermo Moncecchi
Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media

Computational Humor involves several tasks, such as humor recognition, humor generation, and humor scoring, for which it is useful to have human-curated data. In this work we present a corpus of 27,000 tweets written in Spanish and crowd-annotated by their humor value and funniness score, with about four annotations per tweet, tagged by 1,300 people over the Internet. It is equally divided between tweets coming from humorous and non-humorous accounts. The inter-annotator agreement Krippendorff’s alpha value is 0.5710. The dataset is available for general usage and can serve as a basis for humor detection and as a first step to tackle subjectivity.

pdf abs
A High Coverage Method for Automatic False Friends Detection for Spanish and Portuguese
Santiago Castro | Jairo Bonanata | Aiala Rosá
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

False friends are words in two languages that look or sound similar, but have different meanings. They are a common source of confusion among language learners. Methods to detect them automatically do exist, however they make use of large aligned bilingual corpora, which are hard to find and expensive to build, or encounter problems dealing with infrequent words. In this work we propose a high coverage method that uses word vector representations to build a false friends classifier for any pair of languages, which we apply to the particular case of Spanish and Portuguese. The required resources are a large corpus for each language and a small bilingual lexicon for the pair.

2016

pdf abs
Factuality Annotation and Learning in Spanish Texts
Dina Wonsever | Aiala Rosá | Marisa Malcuori
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present a proposal for the annotation of factuality of event mentions in Spanish texts and a free available annotated corpus. Our factuality model aims to capture a pragmatic notion of factuality, trying to reflect a casual reader judgements about the realis / irrealis status of mentioned events. Also, some learning experiments (SVM and CRF) have been held, showing encouraging results.

2010

pdf
Opinion Identification in Spanish Texts
Aiala Rosá | Dina Wonsever | Jean-Luc Minel
Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas

2008

pdf abs
Identification automatique de marques d’opinion dans des textes
Aiala Rosá
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. REncontres jeunes Chercheurs en Informatique pour le Traitement Automatique des Langues

Nous présentons un modèle conceptuel pour la représentation d’opinions, en analysant les éléments qui les composent et quelques propriétés. Ce modèle conceptuel est implémenté et nous en décrivons le jeu d’annotations. Le processus automatique d’annotation de textes en espagnol est effectué par application de règles contextuelles. Un premier sous-ensemble de règles a été écrit pour l’identification de quelques éléments du modèle. Nous analysons les premiers résultats de leur application.

Co-authors

Diego Garat 1

Guillermo Moncecchi 1

Jairo Bonanata 1