Paolo Rosso


2024

pdf
PyRater: A Python Toolkit for Annotation Analysis
Angelo Basile | Marc Franco-Salvador | Paolo Rosso
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We introduce PyRater, an open-source Python toolkit designed for analysing corpora annotations. When creating new annotated language resources, probabilistic models of annotation are the state-of-the-art solution for identifying the best annotators, retrieving the gold standard, and more generally separating annotation signal from noise. PyRater offers a unified interface for several such models and includes an API for the addition of new ones. Additionally, the toolkit has built-in functions to read datasets with multiple annotations and plot the analysis outcomes. In this work, we also demonstrate a novel application of PyRater to zero-shot classifiers, where it effectively selects the best-performing prompt. We make PyRater available to the research community.

pdf
RoCode: A Dataset for Measuring Code Intelligence from Problem Definitions in Romanian
Adrian Cosma | Ioan-Bogdan Iordache | Paolo Rosso
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Recently, large language models (LLMs) have become increasingly powerful and have become capable of solving a plethora of tasks through proper instructions in natural language. However, the vast majority of testing suites assume that the instructions are written in English, the de facto prompting language. Code intelligence and problem solving still remain a difficult task, even for the most advanced LLMs. Currently, there are no datasets to measure the generalization power for code-generation models in a language other than English. In this work, we present RoCode, a competitive programming dataset, consisting of 2,642 problems written in Romanian, 11k solutions in C, C++ and Python and comprehensive testing suites for each problem. The purpose of RoCode is to provide a benchmark for evaluating the code intelligence of language models trained on Romanian / multilingual text as well as a fine-tuning set for pretrained Romanian models. Through our results and review of related works, we argue for the need to develop code models for languages other than English.

pdf
Soft metrics for evaluation with disagreements: an assessment
Giulia Rizzi | Elisa Leonardelli | Massimo Poesio | Alexandra Uma | Maja Pavlovic | Silviu Paun | Paolo Rosso | Elisabetta Fersini
Proceedings of the 3rd Workshop on Perspectivist Approaches to NLP (NLPerspectives) @ LREC-COLING 2024

The move towards preserving judgement disagreements in NLP requires the identification of adequate evaluation metrics. We identify a set of key properties that such metrics should have, and assess the extent to which natural candidates for soft evaluation such as Cross Entropy satisfy such properties. We employ a theoretical framework, supported by a visual approach, by practical examples, and by the analysis of a real case scenario. Our results indicate that Cross Entropy can result in fairly paradoxical results in some cases, whereas other measures Manhattan distance and Euclidean distance exhibit a more intuitive behavior, at least for the case of binary classification.

2023

pdf
Definitions Matter: Guiding GPT for Multi-label Classification
Youri Peskine | Damir Korenčić | Ivan Grubisic | Paolo Papotti | Raphael Troncy | Paolo Rosso
Findings of the Association for Computational Linguistics: EMNLP 2023

Large language models have recently risen in popularity due to their ability to perform many natural language tasks without requiring any fine-tuning. In this work, we focus on two novel ideas: (1) generating definitions from examples and using them for zero-shot classification, and (2) investigating how an LLM makes use of the definitions. We thoroughly analyze the performance of GPT-3 model for fine-grained multi-label conspiracy theory classification of tweets using zero-shot labeling. In doing so, we asses how to improve the labeling by providing minimal but meaningful context in the form of the definitions of the labels. We compare descriptive noun phrases, human-crafted definitions, introduce a new method to help the model generate definitions from examples, and propose a method to evaluate GPT-3’s understanding of the definitions. We demonstrate that improving definitions of class labels has a direct consequence on the downstream classification results.

pdf
Zero-Shot Data Maps. Efficient Dataset Cartography Without Model Training
Angelo Basile | Marc Franco-Salvador | Paolo Rosso
Findings of the Association for Computational Linguistics: EMNLP 2023

Data Maps (Swayamdipta, et al. 2020) have emerged as a powerful tool for diagnosing large annotated datasets. Given a model fitted on a dataset, these maps show each data instance from the dataset in a 2-dimensional space defined by a) the model’s confidence in the true class and b) the variability of this confidence. In previous work, confidence and variability are usually computed using training dynamics, which requires the fitting of a strong model to the dataset. In this paper, we introduce a novel approach: Zero-Shot Data Maps based on fast bi-encoder networks. For each data point, confidence on the true label and variability are computed over the members of an ensemble of zero-shot models constructed with different — but semantically equivalent — label descriptions, i.e., textual representations of each class in a given label space. We conduct a comparative analysis of maps compiled using traditional training dynamics and our proposed zero-shot models across various datasets. Our findings reveal that Zero-Shot Data Maps generally match those produced by the traditional method while delivering up to a 14x speedup. The code is available [here](https://github.com/symanto-research/zeroshot-cartography).

pdf
An Active Learning Pipeline for NLU Error Detection in Conversational Agents
Damian Pascual | Aritz Bercher | Akansha Bhardwaj | Mingbo Cui | Dominic Kohler | Liam Van Der Poel | Paolo Rosso
Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII)

High-quality labeled data is paramount to the performance of modern machine learning models. However, annotating data is a time-consuming and costly process that requires human experts to examine large collections of raw data. For conversational agents in production settings with access to large amounts of user-agent conversations, the challenge is to decide what data should be annotated first. We consider the Natural Language Understanding (NLU) component of a conversational agent deployed in a real-world setup with limited resources. We present an active learning pipeline for offline detection of classification errors that leverages two strong classifiers. Then, we perform topic modeling on the potentially mis-classified samples to ease data analysis and to reveal error patterns. In our experiments, we show on a real-world dataset that by using our method to prioritize data annotation we reach 100% of the performance annotating only 36% of the data. Finally, we present an analysis of some of the error patterns revealed and argue that our pipeline is a valuable tool to detect critical errors and reduce the workload of annotators.

pdf
MIND at SemEval-2023 Task 11: From Uncertain Predictions to Subjective Disagreement
Giulia Rizzi | Alessandro Astorino | Daniel Scalena | Paolo Rosso | Elisabetta Fersini
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper describes the participation of the research laboratory MIND, at the University of Milano-Bicocca, in the SemEval 2023 task related to Learning With Disagreements (Le-Wi-Di). The main goal is to identify the level of agreement/disagreement from a collection of textual datasets with different characteristics in terms of style, language and task. The proposed approach is grounded on the hypothesis that the disagreement between annotators could be grasped by the uncertainty that a model, based on several linguistic characteristics, could have on the prediction of a given gold label.

pdf
Vicinal Risk Minimization for Few-Shot Cross-lingual Transfer in Abusive Language Detection
Gretel De la Peña Sarracén | Paolo Rosso | Robert Litschko | Goran Glavaš | Simone Ponzetto
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Cross-lingual transfer learning from high-resource to medium and low-resource languages has shown encouraging results. However, the scarcity of resources in target languages remains a challenge. In this work, we resort to data augmentation and continual pre-training for domain adaptation to improve cross-lingual abusive language detection. For data augmentation, we analyze two existing techniques based on vicinal risk minimization and propose MIXAG, a novel data augmentation method which interpolates pairs of instances based on the angle of their representations. Our experiments involve seven languages typologically distinct from English and three different domains. The results reveal that the data augmentation strategies can enhance few-shot cross-lingual abusive language detection. Specifically, we observe that consistently in all target languages, MIXAG improves significantly in multidomain and multilingual environments. Finally, we show through an error analysis how the domain adaptation can favour the class of abusive texts (reducing false negatives), but at the same time, declines the precision of the abusive language detection model.

pdf
Offensive Language Detection in Arabizi
Imene Bensalem | Meryem Mout | Paolo Rosso
Proceedings of ArabicNLP 2023

Detecting offensive language in under-resourced languages presents a significant real-world challenge for social media platforms. This paper is the first work focused on the issue of offensive language detection in Arabizi, an under-explored topic in an under-resourced form of Arabic. For the first time, a comprehensive and critical overview of the existing work on the topic is presented. In addition, we carry out experiments using different BERT-like models and show the feasibility of detecting offensive language in Arabizi with high accuracy. Throughout a thorough analysis of results, we emphasize the complexities introduced by dialect variations and out-of-domain generalization. We use in our experiments a dataset that we have constructed by leveraging existing, albeit limited, resources. To facilitate further research, we make this dataset publicly accessible to the research community.

2022

pdf
SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification
Elisabetta Fersini | Francesca Gasparini | Giulia Rizzi | Aurora Saibene | Berta Chulvi | Paolo Rosso | Alyssa Lees | Jeffrey Sorensen
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

The paper describes the SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification (MAMI),which explores the detection of misogynous memes on the web by taking advantage of available texts and images. The task has been organised in two related sub-tasks: the first one is focused on recognising whether a meme is misogynous or not (Sub-task A), while the second one is devoted to recognising types of misogyny (Sub-task B). MAMI has been one of the most popular tasks at SemEval-2022 with more than 400 participants, 65 teams involved in Sub-task A and 41 in Sub-task B from 13 countries. The MAMI challenge received 4214 submitted runs (of which 166 uploaded on the leader-board), denoting an enthusiastic participation for the proposed problem. The collection and annotation is described for the task dataset. The paper provides an overview of the systems proposed for the challenge, reports the results achieved in both sub-tasks and outlines a description of the main errors for a comprehension of the systems capabilities and for detailing future research perspectives.

pdf
Unsupervised Embeddings with Graph Auto-Encoders for Multi-domain and Multilingual Hate Speech Detection
Gretel Liz De la Peña Sarracén | Paolo Rosso
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Hate speech detection is a prominent and challenging task, since hate messages are often expressed in subtle ways and with characteristics that may vary depending on the author. Hence, many models suffer from the generalization problem. However, retrieving and monitoring hateful content on social media is a current necessity. In this paper, we propose an unsupervised approach using Graph Auto-Encoders (GAE), which allows us to avoid using labeled data when training the representation of the texts. Specifically, we represent texts as nodes of a graph, and use a transformer layer together with a convolutional layer to encode these nodes in a low-dimensional space. As a result, we obtain embeddings that can be decoded into a reconstruction of the original network. Our main idea is to learn a model with a set of texts without supervision, in order to generate embeddings for the nodes: nodes with the same label should be close in the embedding space, which, in turn, should allow us to distinguish among classes. We employ this strategy to detect hate speech in multi-domain and multilingual sets of texts, where our method shows competitive results on small datasets.

pdf
Multi-Aspect Transfer Learning for Detecting Low Resource Mental Disorders on Social Media
Ana Sabina Uban | Berta Chulvi | Paolo Rosso
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Mental disorders are a serious and increasingly relevant public health issue. NLP methods have the potential to assist with automatic mental health disorder detection, but building annotated datasets for this task can be challenging; moreover, annotated data is very scarce for disorders other than depression. Understanding the commonalities between certain disorders is also important for clinicians who face the problem of shifting standards of diagnosis. We propose that transfer learning with linguistic features can be useful for approaching both the technical problem of improving mental disorder detection in the context of data scarcity, and the clinical problem of understanding the overlapping symptoms between certain disorders. In this paper, we target four disorders: depression, PTSD, anorexia and self-harm. We explore multi-aspect transfer learning for detecting mental disorders from social media texts, using deep learning models with multi-aspect representations of language (including multiple types of interpretable linguistic features). We explore different transfer learning strategies for cross-disorder and cross-platform transfer, and show that transfer learning can be effective for improving prediction performance for disorders where little annotated data is available. We offer insights into which linguistic features are the most useful vehicles for transferring knowledge, through ablation experiments, as well as error analysis.

pdf
FACTOID: A New Dataset for Identifying Misinformation Spreaders and Political Bias
Flora Sakketou | Joan Plepi | Riccardo Cervero | Henri Jacques Geiss | Paolo Rosso | Lucie Flek
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Proactively identifying misinformation spreaders is an important step towards mitigating the impact of fake news on our society. In this paper, we introduce a new contemporary Reddit dataset for fake news spreader analysis, called FACTOID, monitoring political discussions on Reddit since the beginning of 2020. The dataset contains over 4K users with 3.4M Reddit posts, and includes, beyond the users’ binary labels, also their fine-grained credibility level (very low to very high) and their political bias strength (extreme right to extreme left). As far as we are aware, this is the first fake news spreader dataset that simultaneously captures both the long-term context of users’ historical posts and the interactions between them. To create the first benchmark on our data, we provide methods for identifying misinformation spreaders by utilizing the social connections between the users along with their psycho-linguistic features. We show that the users’ social interactions can, on their own, indicate misinformation spreading, while the psycho-linguistic features are mostly informative in non-neural classification settings. In a qualitative analysis we observe that detecting affective mental processes correlates negatively with right-biased users, and that the openness to experience factor is lower for those who spread fake news.

pdf
UPV at the Arabic Hate Speech 2022 Shared Task: Offensive Language and Hate Speech Detection using Transformers and Ensemble Models
Angel Felipe Magnossão de Paula | Paolo Rosso | Imene Bensalem | Wajdi Zaghouani
Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection

This paper describes our participation in the shared task Fine-Grained Hate Speech Detection on Arabic Twitter at the 5th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT). The shared task is divided into three detection subtasks: (i) Detect whether a tweet is offensive or not; (ii) Detect whether a tweet contains hate speech or not; and (iii) Detect the fine-grained type of hate speech (race, religion, ideology, disability, social class, and gender). It is an effort toward the goal of mitigating the spread of offensive language and hate speech in Arabic-written content on social media platforms. To solve the three subtasks, we employed six different transformer versions: AraBert, AraElectra, Albert-Arabic, AraGPT2, mBert, and XLM-Roberta. We experimented with models based on encoder and decoder blocks and models exclusively trained on Arabic and also on several languages. Likewise, we applied two ensemble methods: Majority vote and Highest sum. Our approach outperformed the official baseline in all the subtasks, not only considering F1-macro results but also accuracy, recall, and precision. The results suggest that the Highest sum is an excellent approach to encompassing transformer output to create an ensemble since this method offered at least top-two F1-macro values across all the experiments performed on development and test data.

pdf bib
Do Dependency Relations Help in the Task of Stance Detection?
Alessandra Teresa Cignarella | Cristina Bosco | Paolo Rosso
Proceedings of the Third Workshop on Insights from Negative Results in NLP

In this paper we present a set of multilingual experiments tackling the task of Stance Detection in five different languages: English, Spanish, Catalan, French and Italian. Furthermore, we study the phenomenon of stance with respect to six different targets – one per language, and two different for Italian – employing a variety of machine learning algorithms that primarily exploit morphological and syntactic knowledge as features, represented throughout the format of Universal Dependencies. Results seem to suggest that the methodology employed is not beneficial per se, but might be useful to exploit the same features with a different methodology.

pdf
Cryptocurrency Bubble Detection: A New Stock Market Dataset, Financial Task & Hyperbolic Models
Ramit Sawhney | Shivam Agarwal | Vivek Mittal | Paolo Rosso | Vikram Nanda | Sudheer Chava
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The rapid spread of information over social media influences quantitative trading and investments. The growing popularity of speculative trading of highly volatile assets such as cryptocurrencies and meme stocks presents a fresh challenge in the financial realm. Investigating such “bubbles” - periods of sudden anomalous behavior of markets are critical in better understanding investor behavior and market dynamics. However, high volatility coupled with massive volumes of chaotic social media texts, especially for underexplored assets like cryptocoins pose a challenge to existing methods. Taking the first step towards NLP for cryptocoins, we present and publicly release CryptoBubbles, a novel multi- span identification task for bubble detection, and a dataset of more than 400 cryptocoins from 9 exchanges over five years spanning over two million tweets. Further, we develop a set of sequence-to-sequence hyperbolic models suited to this multi-span identification task based on the power-law dynamics of cryptocurrencies and user behavior on social media. We further test the effectiveness of our models under zero-shot settings on a test set of Reddit posts pertaining to 29 “meme stocks”, which see an increase in trade volume due to social media hype. Through quantitative, qualitative, and zero-shot analyses on Reddit and Twitter spanning cryptocoins and meme-stocks, we show the practical applicability of CryptoBubbles and hyperbolic models.

2021

pdf
Masking and Transformer-based Models for Hyperpartisanship Detection in News
Javier Sánchez-Junquera | Paolo Rosso | Manuel Montes-y-Gómez | Simone Paolo Ponzetto
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Hyperpartisan news show an extreme manipulation of reality based on an underlying and extreme ideological orientation. Because of its harmful effects at reinforcing one’s bias and the posterior behavior of people, hyperpartisan news detection has become an important task for computational linguists. In this paper, we evaluate two different approaches to detect hyperpartisan news. First, a text masking technique that allows us to compare style vs. topic-related features in a different perspective from previous work. Second, the transformer-based models BERT, XLM-RoBERTa, and M-BERT, known for their ability to capture semantic and syntactic patterns in the same representation. Our results corroborate previous research on this task in that topic-related features yield better results than style-based ones, although they also highlight the relevance of using higher-length n-grams. Furthermore, they show that transformer-based models are more effective than traditional methods, but this at the cost of greater computational complexity and lack of transparency. Based on our experiments, we conclude that the beginning of the news show relevant information for the transformers at distinguishing effectively between left-wing, mainstream, and right-wing orientations.

pdf
FakeFlow: Fake News Detection by Modeling the Flow of Affective Information
Bilal Ghanem | Simone Paolo Ponzetto | Paolo Rosso | Francisco Rangel
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Fake news articles often stir the readers’ attention by means of emotional appeals that arouse their feelings. Unlike in short news texts, authors of longer articles can exploit such affective factors to manipulate readers by adding exaggerations or fabricating events, in order to affect the readers’ emotions. To capture this, we propose in this paper to model the flow of affective information in fake news articles using a neural architecture. The proposed model, FakeFlow, learns this flow by combining topic and affective information extracted from text. We evaluate the model’s performance with several experiments on four real-world datasets. The results show that FakeFlow achieves superior results when compared against state-of-the-art methods, thus confirming the importance of capturing the flow of the affective information in news articles.

pdf
RoMa at SemEval-2021 Task 7: A Transformer-based Approach for Detecting and Rating Humor and Offense
Roberto Labadie | Mariano Jason Rodriguez | Reynier Ortega | Paolo Rosso
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

In this paper we describe the systems used by the RoMa team in the shared task on Detecting and Rating Humor and Offense (HaHackathon) at SemEval 2021. Our systems rely on data representations learned through fine-tuned neural language models. Particularly, we explore two distinct architectures. The first one is based on a Siamese Neural Network (SNN) combined with a graph-based clustering method. The SNN model is used for learning a latent space where instances of humor and non-humor can be distinguished. The clustering method is applied to build prototypes of both classes which are used for training and classifying new messages. The second one combines neural language model representations with a linear regression model which makes the final ratings. Our systems achieved the best results for humor classification using model one, whereas for offensive and humor rating the second model obtained better performance. In the case of the controversial humor prediction, the most significant improvement was achieved by a fine-tuning of the neural language model. In general, the results achieved are encouraging and give us a starting point for further improvements.

pdf
Understanding Patterns of Anorexia Manifestations in Social Media Data with Deep Learning
Ana Sabina Uban | Berta Chulvi | Paolo Rosso
Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access

Eating disorders are a growing problem especially among young people, yet they have been under-studied in computational research compared to other mental health disorders such as depression. Computational methods have a great potential to aid with the automatic detection of mental health problems, but state-of-the-art machine learning methods based on neural networks are notoriously difficult to interpret, which is a crucial problem for applications in the mental health domain. We propose leveraging the power of deep learning models for automatically detecting signs of anorexia based on social media data, while at the same time focusing on interpreting their behavior. We train a hierarchical attention network to detect people with anorexia and use its internal encodings to discover different clusters of anorexia symptoms. We interpret the identified patterns from multiple perspectives, including emotion expression, psycho-linguistic features and personality traits, and we offer novel hypotheses to interpret our findings from a psycho-social perspective. Some interesting findings are patterns of word usage in some users with anorexia which show that they feel less as being part of a group compared to control cases, as well as that they have abandoned explanatory activity as a result of a greater feeling of helplessness and fear.

2020

pdf
PRHLT-UPV at SemEval-2020 Task 8: Study of Multimodal Techniques for Memes Analysis
Gretel Liz De la Peña Sarracén | Paolo Rosso | Anastasia Giachanou
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes the system submitted by the PRHLT-UPV team for the task 8 of SemEval-2020: Memotion Analysis. We propose a multimodal model that combines pretrained models of the BERT and VGG architectures. The BERT model is used to process the textual information and VGG the images. The multimodal model is used to classify memes according to the presence of offensive, sarcastic, humorous and motivating content. Also, a sentiment analysis of memes is carried out with the proposed model. In the experiments, the model is compared with other approaches to analyze the relevance of the multimodal model. The results show encouraging performances on the final leaderboard of the competition, reaching good positions in the ranking of systems.

pdf
LIMSI_UPV at SemEval-2020 Task 9: Recurrent Convolutional Neural Network for Code-mixed Sentiment Analysis
Somnath Banerjee | Sahar Ghannay | Sophie Rosset | Anne Vilnat | Paolo Rosso
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes the participation of LIMSI_UPV team in SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text. The proposed approach competed in SentiMix HindiEnglish subtask, that addresses the problem of predicting the sentiment of a given Hindi-English code-mixed tweet. We propose Recurrent Convolutional Neural Network that combines both the recurrent neural network and the convolutional network to better capture the semantics of the text, for code-mixed sentiment analysis. The proposed system obtained 0.69 (best run) in terms of F1 score on the given test data and achieved the 9th place (Codalab username: somban) in the SentiMix Hindi-English subtask.

pdf
PRHLT-UPV at SemEval-2020 Task 12: BERT for Multilingual Offensive Language Detection
Gretel Liz De la Peña Sarracén | Paolo Rosso
Proceedings of the Fourteenth Workshop on Semantic Evaluation

The present paper describes the system submitted by the PRHLT-UPV team for the task 12 of SemEval-2020: OffensEval 2020. The official title of the task is Multilingual Offensive Language Identification in Social Media, and aims to identify offensive language in texts. The languages included in the task are English, Arabic, Danish, Greek and Turkish. We propose a model based on the BERT architecture for the analysis of texts in English. The approach leverages knowledge within a pre-trained model and performs fine-tuning for the particular task. In the analysis of the other languages the Multilingual BERT is used, which has been pre-trained for a large number of languages. In the experiments, the proposed method for English texts is compared with other approaches to analyze the relevance of the architecture used. Furthermore, simple models for the other languages are evaluated to compare them with the proposed one. The experimental results show that the model based on BERT outperforms other approaches. The main contribution of this work lies in this study, despite not obtaining the first positions in most cases of the competition ranking.

pdf bib
Profiling Bots, Fake News Spreaders and Haters
Paolo Rosso
Proceedings of the Workshop on Resources and Techniques for User and Author Profiling in Abusive Language

Author profiling studies how language is shared by people. Stylometry techniques help in identifying aspects such as gender, age, native language, or even personality. Author profiling is a problem of growing importance, not only in marketing and forensics, but also in cybersecurity. The aim is not only to identify users whose messages are potential threats from a terrorism viewpoint but also those whose messages are a threat from a social exclusion perspective because containing hate speech, cyberbullying etc. Bots often play a key role in spreading hate speech, as well as fake news, with the purpose of polarizing the public opinion with respect to controversial issues like Brexit or the Catalan referendum. For instance, the authors of a recent study about the 1 Oct 2017 Catalan referendum, showed that in a dataset with 3.6 million tweets, about 23.6% of tweets were produced by bots. The target of these bots were pro-independence influencers that were sent negative, emotional and aggressive hateful tweets with hashtags such as #sonunesbesties (i.e. #theyareanimals). Since 2013 at the PAN Lab at CLEF (https://pan.webis.de/) we have addressed several aspects of author profiling in social media. In 2019 we investigated the feasibility of distinguishing whether the author of a Twitter feed is a bot, while this year we are addressing the problem of profiling those authors that are more likely to spread fake news in Twitter because they did in the past. We aim at identifying possible fake news spreaders as a first step towards preventing fake news from being propagated among online users (fake news aim to polarize the public opinion and may contain hate speech). In 2021 we specifically aim at addressing the challenging problem of profiling haters in social media in order to monitor abusive language and prevent cases of social exclusion in order to combat, for instance, racism, xenophobia and misogyny. Although we already started addressing the problem of detecting hate speech when targets are immigrants or women at the HatEval shared task in SemEval-2019, and when targets are women also in the Automatic Misogyny Identification tasks at IberEval-2018, Evalita-2018 and Evalita-2020, it was not done from an author profiling perspective. At the end of the keynote, I will present some insights in order to stress the importance of monitoring abusive language in social media, for instance, in foreseeing sexual crimes. In fact, previous studies confirmed that a correlation might lay between the yearly per capita rate of rape and the misogynistic language used in Twitter.

pdf
Marking Irony Activators in a Universal Dependencies Treebank: The Case of an Italian Twitter Corpus
Alessandra Teresa Cignarella | Manuela Sanguinetti | Cristina Bosco | Paolo Rosso
Proceedings of the Twelfth Language Resources and Evaluation Conference

The recognition of irony is a challenging task in the domain of Sentiment Analysis, and the availability of annotated corpora may be crucial for its automatic processing. In this paper we describe a fine-grained annotation scheme centered on irony, in which we highlight the tokens that are responsible for its activation, (irony activators) and their morpho-syntactic features. As our case study we therefore introduce a recently released Universal Dependencies treebank for Italian which includes ironic tweets: TWITTIRÒ-UD. For the purposes of this study, we enriched the existing annotation in the treebank, with a further level that includes irony activators. A description and discussion of the annotation scheme is provided with a definition of irony activators and the guidelines for their annotation. This qualitative study on the different layers of annotation applied on the same dataset can shed some light on the process of human annotation, and irony annotation in particular, and on the usefulness of this representation for developing computational models of irony to be used for training purposes.

pdf
Multilingual Irony Detection with Dependency Syntax and Neural Models
Alessandra Teresa Cignarella | Valerio Basile | Manuela Sanguinetti | Cristina Bosco | Paolo Rosso | Farah Benamara
Proceedings of the 28th International Conference on Computational Linguistics

This paper presents an in-depth investigation of the effectiveness of dependency-based syntactic features on the irony detection task in a multilingual perspective (English, Spanish, French and Italian). It focuses on the contribution from syntactic knowledge, exploiting linguistic resources where syntax is annotated according to the Universal Dependencies scheme. Three distinct experimental settings are provided. In the first, a variety of syntactic dependency-based features combined with classical machine learning classifiers are explored. In the second scenario, two well-known types of word embeddings are trained on parsed data and tested against gold standard datasets. In the third setting, dependency-based syntactic features are combined into the Multilingual BERT architecture. The results suggest that fine-grained dependency-based syntactic information is informative for the detection of irony.

2019

pdf
SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter
Valerio Basile | Cristina Bosco | Elisabetta Fersini | Debora Nozza | Viviana Patti | Francisco Manuel Rangel Pardo | Paolo Rosso | Manuela Sanguinetti
Proceedings of the 13th International Workshop on Semantic Evaluation

The paper describes the organization of the SemEval 2019 Task 5 about the detection of hate speech against immigrants and women in Spanish and English messages extracted from Twitter. The task is organized in two related classification subtasks: a main binary subtask for detecting the presence of hate speech, and a finer-grained one devoted to identifying further features in hateful contents such as the aggressive attitude and the target harassed, to distinguish if the incitement is against an individual rather than a group. HatEval has been one of the most popular tasks in SemEval-2019 with a total of 108 submitted runs for Subtask A and 70 runs for Subtask B, from a total of 74 different teams. Data provided for the task are described by showing how they have been collected and annotated. Moreover, the paper provides an analysis and discussion about the participant systems and the results they achieved in both subtasks.

pdf
DeepAnalyzer at SemEval-2019 Task 6: A deep learning-based ensemble method for identifying offensive tweets
Gretel Liz De la Peña | Paolo Rosso
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes the system we developed for SemEval 2019 on Identifying and Categorizing Offensive Language in Social Media (OffensEval - Task 6). The task focuses on offensive language in tweets. It is organized into three sub-tasks for offensive language identification; automatic categorization of offense types and offense target identification. The approach for the first subtask is a deep learning-based ensemble method which uses a Bidirectional LSTM Recurrent Neural Network and a Convolutional Neural Network. Additionally we use the information from part-of-speech tagging of tweets for target identification and combine previous results for categorization of offense types.

pdf
TUVD team at SemEval-2019 Task 6: Offense Target Identification
Elena Shushkevich | John Cardiff | Paolo Rosso
Proceedings of the 13th International Workshop on Semantic Evaluation

This article presents our approach for detecting a target of offensive messages in Twitter, including Individual, Group and Others classes. The model we have created is an ensemble of simpler models, including Logistic Regression, Naive Bayes, Support Vector Machine and the interpolation between Logistic Regression and Naive Bayes with 0.25 coefficient of interpolation. The model allows us to achieve 0.547 macro F1-score.

pdf
UPV-28-UNITO at SemEval-2019 Task 7: Exploiting Post’s Nesting and Syntax Information for Rumor Stance Classification
Bilal Ghanem | Alessandra Teresa Cignarella | Cristina Bosco | Paolo Rosso | Francisco Manuel Rangel Pardo
Proceedings of the 13th International Workshop on Semantic Evaluation

In the present paper we describe the UPV-28-UNITO system’s submission to the RumorEval 2019 shared task. The approach we applied for addressing both the subtasks of the contest exploits both classical machine learning algorithms and word embeddings, and it is based on diverse groups of features: stylistic, lexical, emotional, sentiment, meta-structural and Twitter-based. A novel set of features that take advantage of the syntactic information in texts is moreover introduced in the paper.

pdf
Presenting TWITTIRÒ-UD: An Italian Twitter Treebank in Universal Dependencies
Alessandra Teresa Cignarella | Cristina Bosco | Paolo Rosso
Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)

2018

pdf
Cross-corpus Native Language Identification via Statistical Embedding
Francisco Rangel | Paolo Rosso | Julian Brooke | Alexandra Uitdenbogerd
Proceedings of the Second Workshop on Stylistic Variation

In this paper, we approach the task of native language identification in a realistic cross-corpus scenario where a model is trained with available data and has to predict the native language from data of a different corpus. The motivation behind this study is to investigate native language identification in the Australian academic scenario where a majority of students come from China, Indonesia, and Arabic-speaking nations. We have proposed a statistical embedding representation reporting a significant improvement over common single-layer approaches of the state of the art, identifying Chinese, Arabic, and Indonesian in a cross-corpus scenario. The proposed approach was shown to be competitive even when the data is scarce and imbalanced.

pdf
Stance Detection in Fake News A Combined Feature Representation
Bilal Ghanem | Paolo Rosso | Francisco Rangel
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)

With the uncontrolled increasing of fake news and rumors over the Web, different approaches have been proposed to address the problem. In this paper, we present an approach that combines lexical, word embeddings and n-gram features to detect the stance in fake news. Our approach has been tested on the Fake News Challenge (FNC-1) dataset. Given a news title-article pair, the FNC-1 task aims at determining the relevance of the article and the title. Our proposed approach has achieved an accurate result (59.6 % Macro F1) that is close to the state-of-the-art result with 0.013 difference using a simple feature representation. Furthermore, we have investigated the importance of different lexicons in the detection of the classification labels.

pdf
CATS: A Tool for Customized Alignment of Text Simplification Corpora
Sanja Štajner | Marc Franco-Salvador | Paolo Rosso | Simone Paolo Ponzetto
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
LDR at SemEval-2018 Task 3: A Low Dimensional Text Representation for Irony Detection
Bilal Ghanem | Francisco Rangel | Paolo Rosso
Proceedings of the 12th International Workshop on Semantic Evaluation

In this paper we describe our participation in the SemEval-2018 task 3 Shared Task on Irony Detection. We have approached the task with our low dimensionality representation method (LDR), which exploits low dimensional features extracted from text on the basis of the occurrence probability of the words depending on each class. Our intuition is that words in ironic texts have different probability of occurrence than in non-ironic ones. Our approach obtained acceptable results in both subtasks A and B. We have performed an error analysis that shows the difference on correct and incorrect classified tweets.

pdf
INAOE-UPV at SemEval-2018 Task 3: An Ensemble Approach for Irony Detection in Twitter
Delia Irazú Hernández Farías | Fernando Sánchez-Vega | Manuel Montes-y-Gómez | Paolo Rosso
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes an ensemble approach to the SemEval-2018 Task 3. The proposed method is composed of two renowned methods in text classification together with a novel approach for capturing ironic content by exploiting a tailored lexicon for irony detection. We experimented with different ensemble settings. The obtained results show that our method has a good performance for detecting the presence of ironic content in Twitter.

pdf
ValenTO at SemEval-2018 Task 3: Exploring the Role of Affective Content for Detecting Irony in English Tweets
Delia Irazú Hernández Farías | Viviana Patti | Paolo Rosso
Proceedings of the 12th International Workshop on Semantic Evaluation

In this paper we describe the system used by the ValenTO team in the shared task on Irony Detection in English Tweets at SemEval 2018. The system takes as starting point emotIDM, an irony detection model that explores the use of affective features based on a wide range of lexical resources available for English, reflecting different facets of affect. We experimented with different settings, by exploiting different classifiers and features, and participated both to the binary irony detection task and to the task devoted to distinguish among different types of irony. We report on the results obtained by our system both in a constrained setting and unconstrained setting, where we explored the impact of using additional data in the training phase, such as corpora annotated for the presence of irony or sarcasm from the state of the art. Overall, the performance of our system seems to validate the important role that affective information has for identifying ironic content in Twitter.

2017

pdf
Single and Cross-domain Polarity Classification using String Kernels
Rosa M. Giménez-Pérez | Marc Franco-Salvador | Paolo Rosso
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

The polarity classification task aims at automatically identifying whether a subjective text is positive or negative. When the target domain is different from those where a model was trained, we refer to a cross-domain setting. That setting usually implies the use of a domain adaptation method. In this work, we study the single and cross-domain polarity classification tasks from the string kernels perspective. Contrary to classical domain adaptation methods, which employ texts from both domains to detect pivot features, we do not use the target domain for training. Our approach detects the lexical peculiarities that characterise the text polarity and maps them into a domain independent space by means of kernel discriminant analysis. Experimental results show state-of-the-art performance in single and cross-domain polarity classification.

pdf
Convolutional Neural Networks for Authorship Attribution of Short Texts
Prasha Shrestha | Sebastian Sierra | Fabio González | Manuel Montes | Paolo Rosso | Thamar Solorio
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We present a model to perform authorship attribution of tweets using Convolutional Neural Networks (CNNs) over character n-grams. We also present a strategy that improves model interpretability by estimating the importance of input text fragments in the predicted classification. The experimental evaluation shows that text CNNs perform competitively and are able to outperform previous methods.

pdf
Author Profiling at PAN: from Age and Gender Identification to Language Variety Identification (invited talk)
Paolo Rosso
Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)

Author profiling is the study of how language is shared by people, a problem of growing importance in applications dealing with security, in order to understand who could be behind an anonymous threat message, and marketing, where companies may be interested in knowing the demographics of people that in online reviews liked or disliked their products. In this talk we will give an overview of the PAN shared tasks that since 2013 have been organised at CLEF and FIRE evaluation forums, mainly on age and gender identification in social media, although also personality recognition in Twitter as well as in code sources was also addressed. In 2017 the PAN author profiling shared task addresses jointly gender and language variety identification in Twitter where tweets have been annotated with authors’ gender and their specific variation of their native language: English (Australia, Canada, Great Britain, Ireland, New Zealand, United States), Spanish (Argentina, Chile, Colombia, Mexico, Peru, Spain, Venezuela), Portuguese (Brazil, Portugal), and Arabic (Egypt, Gulf, Levantine, Maghrebi).

pdf
Learning Multimodal Gender Profile using Neural Networks
Carlos Pérez Estruch | Roberto Paredes Palacios | Paolo Rosso
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Gender identification in social networks is one of the most popular aspects of user profile learning. Traditionally it has been linked to author profiling, a difficult problem to solve because of the little difference in the use of language between genders. This situation has led to the need of taking into account other information apart from textual data, favoring the emergence of multimodal data. The aim of this paper is to apply neural networks to perform data fusion, using an existing multimodal corpus, the NUS-MSS data set, that (not only) contains text data, but also image and location information. We improved previous results in terms of macro accuracy (87.8%) obtaining the state-of-the-art performance of 91.3%.

pdf
Sentence Alignment Methods for Improving Text Simplification Systems
Sanja Štajner | Marc Franco-Salvador | Simone Paolo Ponzetto | Paolo Rosso | Heiner Stuckenschmidt
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We provide several methods for sentence-alignment of texts with different complexity levels. Using the best of them, we sentence-align the Newsela corpora, thus providing large training materials for automatic text simplification (ATS) systems. We show that using this dataset, even the standard phrase-based statistical machine translation models for ATS can outperform the state-of-the-art ATS systems.

2016

pdf
UH-PRHLT at SemEval-2016 Task 3: Combining Lexical and Semantic-based Features for Community Question Answering
Marc Franco-Salvador | Sudipta Kar | Thamar Solorio | Paolo Rosso
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf
Using a Small Lexicon with CRFs Confidence Measure to Improve POS Tagging Accuracy
Mohamed Outahajala | Paolo Rosso
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Like most of the languages which have only recently started being investigated for the Natural Language Processing (NLP) tasks, Amazigh lacks annotated corpora and tools and still suffers from the scarcity of linguistic tools and resources. The main aim of this paper is to present a new part-of-speech (POS) tagger based on a new Amazigh tag set (AMTS) composed of 28 tags. In line with our goal we have trained Conditional Random Fields (CRFs) to build a POS tagger for the Amazigh language. We have used the 10-fold technique to evaluate and validate our approach. The CRFs 10 folds average level is 87.95% and the best fold level result is 91.18%. In order to improve this result, we have gathered a set of about 8k words with their POS tags. The collected lexicon was used with CRFs confidence measure in order to have a more accurate POS-tagger. Hence, we have obtained a better performance of 93.82%.

pdf
Arabic WordNet: New Content and New Applications
Yasser Regragui | Lahsen Abouenour | Fettoum Krieche | Karim Bouzoubaa | Paolo Rosso
Proceedings of the 8th Global WordNet Conference (GWC)

The Arabic WordNet project has provided the Arabic Natural Language Processing (NLP) community with the first WordNet-compliant resource. It allowed new possibilities in terms of building sophisticated NLP applications related to this Semitic language. In this paper, we present the new content added to this resource, using semi-automatic techniques, and validated by Arabic native-speaker lexicographers. We also present how this content helps in the implementation of new Arabic NLP applications, especially for Question Answering (QA) systems. The obtained results show the contribution of the added content. The resource, fully transformed into the standard Lexical Markup Framework (LMF), is made available for the community.

2015

pdf bib
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics
Martha Palmer | Gemma Boleda | Paolo Rosso
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

pdf
SemEval-2015 Task 11: Sentiment Analysis of Figurative Language in Twitter
Aniruddha Ghosh | Guofu Li | Tony Veale | Paolo Rosso | Ekaterina Shutova | John Barnden | Antonio Reyes
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf
Classification of deceptive opinions using a low dimensionality representation
Leticia Cagnina | Paolo Rosso
Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf
Distributed Representations of Words and Documents for Discriminating Similar Languages
Marc Franco-Salvador | Paolo Rosso | Francisco Rangel
Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects

pdf
NLEL UPV Autoritas Participation at Discrimination between Similar Languages (DSL) 2015 Shared Task
Raül Fabra-Boluda | Francisco Rangel | Paolo Rosso
Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects

2014

pdf
English-to-Hindi system description for WMT 2014: Deep Source-Context Features for Moses
Marta R. Costa-jussà | Parth Gupta | Paolo Rosso | Rafael E. Banchs
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
SocialIrony
Paolo Rosso
Proceedings of the Second Workshop on Natural Language Processing for Social Media (SocialNLP)

pdf
Enrichment of Bilingual Dictionary through News Stream Data
Ajay Dubey | Parth Gupta | Vasudeva Varma | Paolo Rosso
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Bilingual dictionaries are the key component of the cross-lingual similarity estimation methods. Usually such dictionary generation is accomplished by manual or automatic means. Automatic generation approaches include to exploit parallel or comparable data to derive dictionary entries. Such approaches require large amount of bilingual data in order to produce good quality dictionary. Many time the language pair does not have large bilingual comparable corpora and in such cases the best automatic dictionary is upper bounded by the quality and coverage of such corpora. In this work we propose a method which exploits continuous quasi-comparable corpora to derive term level associations for enrichment of such limited dictionary. Though we propose our experiments for English and Hindi, our approach can be easily extendable to other languages. We evaluated dictionary by manually computing the precision. In experiments we show our approach is able to derive interesting term level associations across languages.

pdf
A Knowledge-based Representation for Cross-Language Document Retrieval and Categorization
Marc Franco-Salvador | Paolo Rosso | Roberto Navigli
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Cross-Topic Authorship Attribution: Will Out-Of-Topic Data Help?
Upendra Sapkota | Thamar Solorio | Manuel Montes | Steven Bethard | Paolo Rosso
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf
Intrinsic Plagiarism Detection using N-gram Classes
Imene Bensalem | Paolo Rosso | Salim Chikhi
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf
Using PU-Learning to Detect Deceptive Opinion Spam
Donato Hernández Fusilier | Rafael Guzmán Cabrera | Manuel Montes-y-Gómez | Paolo Rosso
Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf
INAOE_UPV-CORE: Extracting Word Associations from Document Corpora to estimate Semantic Textual Similarity
Fernando Sánchez-Vega | Manuel Montes-y-Gómez | Paolo Rosso | Luis Villaseñor-Pineda
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

pdf
Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection
Alberto Barrón-Cedeño | Marta Vila | M. Antònia Martí | Paolo Rosso
Computational Linguistics, Volume 39, Issue 4 - December 2013

2012

pdf
Modelling Fixated Discourse in Chats with Cyberpedophiles
Dasha Bogdanova | Paolo Rosso | Thamar Solorio
Proceedings of the Workshop on Computational Approaches to Deception Detection

pdf
Text Reuse with ACL: (Upward) Trends
Parth Gupta | Paolo Rosso
Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries

pdf
On the Impact of Sentiment and Emotion Based Features in Detecting Online Sexual Predators
Dasha Bogdanova | Paolo Rosso | Thamar Solorio
Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis

pdf
Expected Divergence Based Feature Selection for Learning to Rank
Parth Gupta | Paolo Rosso
Proceedings of COLING 2012: Posters

pdf
Evaluating the Similarity Estimator component of the TWIN Personality-based Recommender System
Alexandra Roshchina | John Cardiff | Paolo Rosso
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

With the constant increase in the amount of information available in online communities, the task of building an appropriate Recommender System to support the user in her decision making process is becoming more and more challenging. In addition to the classical collaborative filtering and content based approaches, taking into account ratings, preferences and demographic characteristics of the users, a new type of Recommender System, based on personality parameters, has been emerging recently. In this paper we describe the TWIN (Tell Me What I Need) Personality Based Recommender System, and report on our experiments and experiences of utilizing techniques which allow the extraction of the personality type from text (following the Big Five model popular in the psychological research). We estimate the possibility of constructing the personality-based Recommender System that does not require users to fill in personality questionnaires. We are applying the proposed system in the online travelling domain to perform TripAdvisor hotels recommendation by analysing the text of user generated reviews, which are freely accessible from the community website.

pdf bib
DeSoCoRe: Detecting Source Code Re-Use across Programming Languages
Enrique Flores | Alberto Barrón-Cedeño | Paolo Rosso | Lidia Moreno
Proceedings of the Demonstration Session at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

pdf
Mining Subjective Knowledge from Customer Reviews: A Specific Case of Irony Detection
Antonio Reyes | Paolo Rosso
Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011)

pdf
On the Difficulty of Clustering Microblog Texts for Online Reputation Management
Fernando Perez-Tellez | David Pinto | John Cardiff | Paolo Rosso
Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011)

pdf
User Profile Construction in the TWIN Personality-based Recommender System
Alexandra Roshchina | John Cardiff | Paolo Rosso
Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2011)

2010

pdf
Corpus and Evaluation Measures for Automatic Plagiarism Detection
Alberto Barrón-Cedeño | Martin Potthast | Paolo Rosso | Benno Stein
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The simple access to texts on digital libraries and the World Wide Web has led to an increased number of plagiarism cases in recent years, which renders manual plagiarism detection infeasible at large. Various methods for automatic plagiarism detection have been developed whose objective is to assist human experts in the analysis of documents for plagiarism. The methods can be divided into two main approaches: intrinsic and external. Unlike other tasks in natural language processing and information retrieval, it is not possible to publish a collection of real plagiarism cases for evaluation purposes since they cannot be properly anonymized. Therefore, current evaluations found in the literature are incomparable and, very often not even reproducible. Our contribution in this respect is a newly developed large-scale corpus of artificial plagiarism useful for the evaluation of intrinsic as well as external plagiarism detection. Additionally, new detection performance measures tailored to the evaluation of plagiarism detection algorithms are proposed.

pdf
English-Spanish Large Statistical Dictionary of Inflectional Forms
Grigori Sidorov | Alberto Barrón-Cedeño | Paolo Rosso
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The paper presents an approach for constructing a weighted bilingual dictionary of inflectional forms using as input data a traditional bilingual dictionary, and not parallel corpora. An algorithm is developed that generates all possible morphological (inflectional) forms and weights them using information on distribution of corresponding grammar sets (grammar information) in large corpora for each language. The algorithm also takes into account the compatibility of grammar sets in a language pair; for example, verb in past tense in language L normally is expected to be translated by verb in past tense in Language L'. We consider that the developed method is universal, i.e. can be applied to any pair of languages. The obtained dictionary is freely available. It can be used in several NLP tasks, for example, statistical machine translation.

pdf
Evaluation Protocol and Tools for Question-Answering on Speech Transcripts
Nicolas Moreau | Olivier Hamon | Djamel Mostefa | Sophie Rosset | Olivier Galibert | Lori Lamel | Jordi Turmo | Pere R. Comas | Paolo Rosso | Davide Buscaldi | Khalid Choukri
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Question Answering (QA) technology aims at providing relevant answers to natural language questions. Most Question Answering research has focused on mining document collections containing written texts to answer written questions. In addition to written sources, a large (and growing) amount of potentially interesting information appears in spoken documents, such as broadcast news, speeches, seminars, meetings or telephone conversations. The QAST track (Question-Answering on Speech Transcripts) was introduced in CLEF to investigate the problem of question answering in such audio documents. This paper describes in detail the evaluation protocol and tools designed and developed for the CLEF-QAST evaluation campaigns that have taken place between 2007 and 2009. We first remind the data, question sets, and submission procedures that were produced or set up during these three campaigns. As for the evaluation procedure, the interface that was developed to ease the assessors’ work is described. In addition, this paper introduces a methodology for a semi-automatic evaluation of QAST systems based on time slot comparisons. Finally, the QAST Evaluation Package 2007-2009 resulting from these evaluation campaigns is also introduced.

pdf
Personal Sense and Idiolect: Combining Authorship Attribution and Opinion Analysis
Polina Panicheva | John Cardiff | Paolo Rosso
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Subjectivity analysis and authorship attribution are very popular areas of research. However, work in these two areas has been done separately. We believe that by combining information about subjectivity in texts and authorship, the performance of both tasks can be improved. In the paper a personalized approach to opinion mining is presented, in which the notions of personal sense and idiolect are introduced; the approach is applied to the polarity classification task. It is assumed that different authors express their private states in text individually, and opinion mining results could be improved by analyzing texts by different authors separately. The hypothesis is tested on a corpus of movie reviews by ten authors. The results of applying the personalized approach to opinion mining are presented, confirming that the approach increases the performance of the opinion mining task. Automatic authorship attribution is further applied to model the personalized approach, classifying documents by their assumed authorship. Although the automatic authorship classification imposes a number of limitations on the dataset for further experiments, after overcoming these issues the authorship attribution technique modeling the personalized approach confirms the increase over the baseline with no authorship information used.

pdf
Evaluating Humour Features on Web Comments
Antonio Reyes | Martin Potthast | Paolo Rosso | Benno Stein
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Research on automatic humor recognition has developed several features which discriminate funny text from ordinary text. The features have been demonstrated to work well when classifying the funniness of single sentences up to entire blogs. In this paper we focus on evaluating a set of the best humor features reported in the literature over a corpus retrieved from the Slashdot Web site. The corpus is categorized in a community-driven process according to the following tags: funny, informative, insightful, offtopic, flamebait, interesting and troll. These kinds of comments can be found on almost every large Web site; therefore, they impose a new challenge to humor retrieval since they come along with unique characteristics compared to other text types. If funny comments were retrieved accurately, they would be of a great entertainment value for the visitors of a given Web page. Our objective, thus, is to distinguish between an implicit funny comment from a not funny one. Our experiments are preliminary but nonetheless large-scale: 600,000 Web comments. We evaluate the classification accuracy of naive Bayes classifiers, decision trees, and support vector machines. The results suggested interesting findings.

pdf
Arabic Named Entity Recognition: Using Features Extracted from Noisy Data
Yassine Benajiba | Imed Zitouni | Mona Diab | Paolo Rosso
Proceedings of the ACL 2010 Conference Short Papers

pdf
Plagiarism Detection across Distant Language Pairs
Alberto Barrón-Cedeño | Paolo Rosso | Eneko Agirre | Gorka Labaka
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf
An Evaluation Framework for Plagiarism Detection
Martin Potthast | Benno Stein | Alberto Barrón-Cedeño | Paolo Rosso
Coling 2010: Posters

2009

pdf
Structure-Based Evaluation of an Arabic Semantic Query Expansion Using the JIRS Passage Retrieval System
Lahsen Abouenour | Karim Bouzoubaa | Paolo Rosso
Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages

2008

pdf
Geo-WordNet: Automatic Georeferencing of WordNet
Davide Buscaldi | Paolo Rosso
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

WordNet has been used extensively as a resource for the Word Sense Disambiguation (WSD) task, both as a sense inventory and a repository of semantic relationships. Recently, we investigated the possibility to use it as a resource for the Geographical Information Retrieval task, more specifically for the toponym disambiguation task, which could be considered a specialization of WSD. We found that it would be very useful to assign to geographical entities inWordNet their coordinates, especially in order to implement geometric shapebased disambiguation methods. This paper presents Geo-WordNet, an automatic annotation of WordNet with geographical coordinates. The annotation has been carried out by extracting geographical synsets from WordNet, together with their holonyms and hypernyms, and comparing them to the entries in the Wikipedia-World geographical database. A weight was calculated for each of the candidate annotations, on the basis of matches found between the database entries and synset gloss, holonyms and hypernyms. The resulting resource may be used in Geographical Information Retrieval related tasks, especially for toponym disambiguation.

pdf
Arabic Named Entity Recognition using Optimized Feature Sets
Yassine Benajiba | Mona Diab | Paolo Rosso
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf
UPV-SI: Word Sense Induction using Self Term Expansion
David Pinto | Paolo Rosso | Héctor Jiménez-Salazar
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf
UPV-WSD : Combining different WSD Methods by means of Fuzzy Borda Voting
Davide Buscaldi | Paolo Rosso
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2006

pdf
Mining Knowledge fromWikipedia for the Question Answering task
Davide Buscaldi | Paolo Rosso
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Although significant advances have been made recently in the Question Answering technology, more steps have to be undertaken in order to obtain better results. Moreover, the best systems at the CLEF and TREC evaluation exercises are very complex systems based on custom-built, expensive ontologies whose aim is to provide the systems with encyclopedic knowledge. In this paper we investigated the use of Wikipedia, the open domain encyclopedia, for the Question Answering task. Previous works considered Wikipedia as a resource where to look for the answers to the questions. We focused on some different aspects of the problem, such as the validation of the answers as returned by our Question Answering System and on the use of Wikipedia “categories” in order to determine a set of patterns that should fit with the expected answer. Validation consists in, given a possible answer, saying wether it is the right one or not. The possibility to exploit the categories ofWikipedia was not considered until now. We performed our experiments using the Spanish version of Wikipedia, with the set of questions of the last CLEF Spanish monolingual exercise. Results show that Wikipedia is a potentially useful resource for the Question Answering task.

2004

pdf
The upv-unige-CIAOSENSO WSD system
Davide Buscaldi | Paolo Rosso | Francesco Masulli
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

Search
Co-authors