Francesco Barbieri


pdf bib
Proceedings of the The First Workshop on Ever Evolving NLP (EvoNLP)
Francesco Barbieri | Jose Camacho-Collados | Bhuwan Dhingra | Luis Espinosa-Anke | Elena Gribovskaya | Angeliki Lazaridou | Daniel Loureiro | Leonardo Neves
Proceedings of the The First Workshop on Ever Evolving NLP (EvoNLP)

XLM-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond
Francesco Barbieri | Luis Espinosa Anke | Jose Camacho-Collados
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Language models are ubiquitous in current NLP, and their multilingual capacity has recently attracted considerable attention. However, current analyses have almost exclusively focused on (multilingual variants of) standard benchmarks, and have relied on clean pre-training and task-specific corpora as multilingual signals. In this paper, we introduce XLM-T, a model to train and evaluate multilingual language models in Twitter. In this paper we provide: (1) a new strong multilingual baseline consisting of an XLM-R (Conneau et al. 2020) model pre-trained on millions of tweets in over thirty languages, alongside starter code to subsequently fine-tune on a target task; and (2) a set of unified sentiment analysis Twitter datasets in eight different languages and a XLM-T model trained on this dataset.

Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts
Asahi Ushio | Francesco Barbieri | Vitor Sousa | Leonardo Neves | Jose Camacho-Collados
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Recent progress in language model pre-training has led to important improvements in Named Entity Recognition (NER). Nonetheless, this progress has been mainly tested in well-formatted documents such as news, Wikipedia, or scientific articles. In social media the landscape is different, in which it adds another layer of complexity due to its noisy and dynamic nature. In this paper, we focus on NER in Twitter, one of the largest social media platforms, and construct a new NER dataset, TweetNER7, which contains seven entity types annotated over 11,382 tweets from September 2019 to August 2021. The dataset was constructed by carefully distributing the tweets over time and taking representative trends as a basis. Along with the dataset, we provide a set of language model baselines and perform an analysis on the language model performance on the task, especially analyzing the impact of different time periods. In particular, we focus on three important temporal aspects in our analysis: short-term degradation of NER models over time, strategies to fine-tune a language model over different periods, and self-labeling as an alternative to lack of recently-labeled data. TweetNER7 is released publicly ( along with the models fine-tuned on it (NER models have been integrated into TweetNLP and can be found at

TimeLMs: Diachronic Language Models from Twitter
Daniel Loureiro | Francesco Barbieri | Leonardo Neves | Luis Espinosa Anke | Jose Camacho-collados
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

Despite its importance, the time variable has been largely neglected in the NLP and language model literature. In this paper, we present TimeLMs, a set of language models specialized on diachronic Twitter data. We show that a continual learning strategy contributes to enhancing Twitter-based language models’ capacity to deal with future and out-of-distribution tweets, while making them competitive with standardized and more monolithic benchmarks. We also perform a number of qualitative analyses showing how they cope with trends and peaks in activity involving specific named entities or concept drift. TimeLMs is available at

TempoWiC: An Evaluation Benchmark for Detecting Meaning Shift in Social Media
Daniel Loureiro | Aminette D’Souza | Areej Nasser Muhajab | Isabella A. White | Gabriel Wong | Luis Espinosa-Anke | Leonardo Neves | Francesco Barbieri | Jose Camacho-Collados
Proceedings of the 29th International Conference on Computational Linguistics

Language evolves over time, and word meaning changes accordingly. This is especially true in social media, since its dynamic nature leads to faster semantic shifts, making it challenging for NLP models to deal with new content and trends. However, the number of datasets and models that specifically address the dynamic nature of these social platforms is scarce. To bridge this gap, we present TempoWiC, a new benchmark especially aimed at accelerating research in social media-based meaning shift. Our results show that TempoWiC is a challenging benchmark, even for recently-released language models specialized in social media.

Twitter Topic Classification
Dimosthenis Antypas | Asahi Ushio | Jose Camacho-Collados | Vitor Silva | Leonardo Neves | Francesco Barbieri
Proceedings of the 29th International Conference on Computational Linguistics

Social media platforms host discussions about a wide variety of topics that arise everyday. Making sense of all the content and organising it into categories is an arduous task. A common way to deal with this issue is relying on topic modeling, but topics discovered using this technique are difficult to interpret and can differ from corpus to corpus. In this paper, we present a new task based on tweet topic classification and release two associated datasets. Given a wide range of topics covering the most important discussion points in social media, we provide training and testing data from recent time periods that can be used to evaluate tweet classification models. Moreover, we perform a quantitative evaluation and analysis of current general- and domain-specific language models on the task, which provide more insights on the challenges and nature of the task.


On Transferability of Bias Mitigation Effects in Language Model Fine-Tuning
Xisen Jin | Francesco Barbieri | Brendan Kennedy | Aida Mostafazadeh Davani | Leonardo Neves | Xiang Ren
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Fine-tuned language models have been shown to exhibit biases against protected groups in a host of modeling tasks such as text classification and coreference resolution. Previous works focus on detecting these biases, reducing bias in data representations, and using auxiliary training objectives to mitigate bias during fine-tuning. Although these techniques achieve bias reduction for the task and domain at hand, the effects of bias mitigation may not directly transfer to new tasks, requiring additional data collection and customized annotation of sensitive attributes, and re-evaluation of appropriate fairness metrics. We explore the feasibility and benefits of upstream bias mitigation (UBM) for reducing bias on downstream tasks, by first applying bias mitigation to an upstream model through fine-tuning and subsequently using it for downstream fine-tuning. We find, in extensive experiments across hate speech detection, toxicity detection and coreference resolution tasks over various bias factors, that the effects of UBM are indeed transferable to new downstream tasks or domains via fine-tuning, creating less biased downstream models than directly fine-tuning on the downstream task or transferring from a vanilla upstream model. Though challenges remain, we show that UBM promises more efficient and accessible bias mitigation in LM fine-tuning.


The Devil is in the Details: Evaluating Limitations of Transformer-based Methods for Granular Tasks
Brihi Joshi | Neil Shah | Francesco Barbieri | Leonardo Neves
Proceedings of the 28th International Conference on Computational Linguistics

Contextual embeddings derived from transformer-based neural language models have shown state-of-the-art performance for various tasks such as question answering, sentiment analysis, and textual similarity in recent years. Extensive work shows how accurately such models can represent abstract, semantic information present in text. In this expository work, we explore a tangent direction and analyze such models’ performance on tasks that require a more granular level of representation. We focus on the problem of textual similarity from two perspectives: matching documents on a granular level (requiring embeddings to capture fine-grained attributes in the text), and an abstract level (requiring embeddings to capture overall textual semantics). We empirically demonstrate, across two datasets from different domains, that despite high performance in abstract document matching as expected, contextual embeddings are consistently (and at times, vastly) outperformed by simple baselines like TF-IDF for more granular tasks. We then propose a simple but effective method to incorporate TF-IDF into models that use contextual embeddings, achieving relative improvements of up to 36% on granular tasks.

TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification
Francesco Barbieri | Jose Camacho-Collados | Luis Espinosa Anke | Leonardo Neves
Findings of the Association for Computational Linguistics: EMNLP 2020

The experimental landscape in natural language processing for social media is too fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics like sentiment analysis to irony detection or emoji prediction. Therefore, it is unclear what the current state of the art is, as there is no standardized evaluation protocol, neither a strong set of baselines trained on such domain-specific data. In this paper, we propose a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks. We also provide a strong set of baselines as starting point, and compare different language modeling pre-training strategies. Our initial experiments show the effectiveness of starting off with existing pre-trained generic language models, and continue training them on Twitter corpora.


Multimodal Emoji Prediction
Francesco Barbieri | Miguel Ballesteros | Francesco Ronzano | Horacio Saggion
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Emojis are small images that are commonly included in social media text messages. The combination of visual and textual content in the same message builds up a modern way of communication, that automatic systems are not used to deal with. In this paper we extend recent advances in emoji prediction by putting forward a multimodal approach that is able to predict emojis in Instagram posts. Instagram posts are composed of pictures together with texts which sometimes include emojis. We show that these emojis can be predicted by using the text, but also using the picture. Our main finding is that incorporating the two synergistic modalities, in a combined model, improves accuracy in an emoji prediction task. This result demonstrates that these two modalities (text and images) encode different information on the use of emojis and therefore can complement each other.

SemEval 2018 Task 2: Multilingual Emoji Prediction
Francesco Barbieri | Jose Camacho-Collados | Francesco Ronzano | Luis Espinosa-Anke | Miguel Ballesteros | Valerio Basile | Viviana Patti | Horacio Saggion
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes the results of the first Shared Task on Multilingual Emoji Prediction, organized as part of SemEval 2018. Given the text of a tweet, the task consists of predicting the most likely emoji to be used along such tweet. Two subtasks were proposed, one for English and one for Spanish, and participants were allowed to submit a system run to one or both subtasks. In total, 49 teams participated to the English subtask and 22 teams submitted a system run to the Spanish subtask. Evaluation was carried out emoji-wise, and the final ranking was based on macro F-Score. Data and further information about this task can be found at

How Gender and Skin Tone Modifiers Affect Emoji Semantics in Twitter
Francesco Barbieri | Jose Camacho-Collados
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

In this paper we analyze the use of emojis in social media with respect to gender and skin tone. By gathering a dataset of over twenty two million tweets from United States some findings are clearly highlighted after performing a simple frequency-based analysis. Moreover, we carry out a semantic analysis on the usage of emojis and their modifiers (e.g. gender and skin tone) by embedding all words, emojis and modifiers into the same vector space. Our analyses reveal that some stereotypes related to the skin color and gender seem to be reflected on the use of these modifiers. For example, emojis representing hand gestures are more widely utilized with lighter skin tones, and the usage across skin tones differs significantly. At the same time, the vector corresponding to the male modifier tends to be semantically close to emojis related to business or technology, whereas their female counterparts appear closer to emojis about love or makeup.

Interpretable Emoji Prediction via Label-Wise Attention LSTMs
Francesco Barbieri | Luis Espinosa-Anke | Jose Camacho-Collados | Steven Schockaert | Horacio Saggion
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Human language has evolved towards newer forms of communication such as social media, where emojis (i.e., ideograms bearing a visual meaning) play a key role. While there is an increasing body of work aimed at the computational modeling of emoji semantics, there is currently little understanding about what makes a computational model represent or predict a given emoji in a certain way. In this paper we propose a label-wise attention mechanism with which we attempt to better understand the nuances underlying emoji prediction. In addition to advantages in terms of interpretability, we show that our proposed architecture improves over standard baselines in emoji prediction, and does particularly well when predicting infrequent emojis.


pdf bib
Towards the Understanding of Gaming Audiences by Modeling Twitch Emotes
Francesco Barbieri | Luis Espinosa-Anke | Miguel Ballesteros | Juan Soler-Company | Horacio Saggion
Proceedings of the 3rd Workshop on Noisy User-generated Text

Videogame streaming platforms have become a paramount example of noisy user-generated text. These are websites where gaming is broadcasted, and allows interaction with viewers via integrated chatrooms. Probably the best known platform of this kind is Twitch, which has more than 100 million monthly viewers. Despite these numbers, and unlike other platforms featuring short messages (e.g. Twitter), Twitch has not received much attention from the Natural Language Processing community. In this paper we aim at bridging this gap by proposing two important tasks specific to the Twitch platform, namely (1) Emote prediction; and (2) Trolling detection. In our experiments, we evaluate three models: a BOW baseline, a logistic supervised classifiers based on word embeddings, and a bidirectional long short-term memory recurrent neural network (LSTM). Our results show that the LSTM model outperforms the other two models, where explicit features with proven effectiveness for similar tasks were encoded.

Are Emojis Predictable?
Francesco Barbieri | Miguel Ballesteros | Horacio Saggion
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Emojis are ideograms which are naturally combined with plain text to visually complement or condense the meaning of a message. Despite being widely used in social media, their underlying semantics have received little attention from a Natural Language Processing standpoint. In this paper, we investigate the relation between words and emojis, studying the novel task of predicting which emojis are evoked by text-based tweet messages. We train several models based on Long Short-Term Memory networks (LSTMs) in this task. Our experimental results show that our neural model outperforms a baseline as well as humans solving the same task, suggesting that computational models are able to better capture the underlying semantics of emojis.


What does this Emoji Mean? A Vector Space Skip-Gram Model for Twitter Emojis
Francesco Barbieri | Francesco Ronzano | Horacio Saggion
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Emojis allow us to describe objects, situations and even feelings with small images, providing a visual and quick way to communicate. In this paper, we analyse emojis used in Twitter with distributional semantic models. We retrieve 10 millions tweets posted by USA users, and we build several skip gram word embedding models by mapping in the same vectorial space both words and emojis. We test our models with semantic similarity experiments, comparing the output of our models with human assessment. We also carry out an exhaustive qualitative evaluation, showing interesting results.


UPF-taln: SemEval 2015 Tasks 10 and 11. Sentiment Analysis of Literal and Figurative Language in Twitter
Francesco Barbieri | Francesco Ronzano | Horacio Saggion
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

How Topic Biases Your Results? A Case Study of Sentiment Analysis and Irony Detection in Italian
Francesco Barbieri | Francesco Ronzano | Horacio Saggion
Proceedings of the International Conference Recent Advances in Natural Language Processing


Modelling Sarcasm in Twitter, a Novel Approach
Francesco Barbieri | Horacio Saggion | Francesco Ronzano
Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Modelling Irony in Twitter: Feature Analysis and Evaluation
Francesco Barbieri | Horacio Saggion
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Irony, a creative use of language, has received scarce attention from the computational linguistics research point of view. We propose an automatic system capable of detecting irony with good accuracy in the social network Twitter. Twitter allows users to post short messages (140 characters) which usually do not follow the expected rules of the grammar, users tend to truncate words and use particular punctuation. For these reason automatic detection of Irony in Twitter is not trivial and requires specific linguistic tools. We propose in this paper a new set of experiments to assess the relevance of the features included in our model. Our model does not include words or sequences of words as features, aiming to detect inner characteristic of Irony.

Modelling Irony in Twitter
Francesco Barbieri | Horacio Saggion
Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics