Maite Taboada

Also published as: M. Taboada

2023

Agonism plays a vital role in democratic dialogue by fostering diverse perspectives and robust discussions. Within the realm of online conflict there is another type: hateful antagonism, which undermines constructive dialogue. Detecting conflict online is central to platform moderation and monetization. It is also vital for democratic dialogue, but only when it takes the form of agonism. To model these two types of conflict, we collected Twitter conversations related to trending controversial topics. We introduce a comprehensive annotation schema for labelling different dimensions of conflict in the conversations, such as the source of conflict, the target, and the rhetorical strategies deployed. Using this schema, we annotated approximately 4,000 conversations with multiple labels. We then train both logistic regression and transformer-based models on the dataset, incorporating context from the conversation, including the number of participants and the structure of the interactions. Results show that contextual labels are helpful in identifying conflict and make the models robust to variations in topic. Our research contributes a conceptualization of different dimensions of conflict, a richly annotated dataset, and promising results that can contribute to content moderation.

2018

pdf abs
Introduction to the Special Issue on Language in Social Media: Exploiting Discourse and Other Contextual Information
Farah Benamara | Diana Inkpen | Maite Taboada
Computational Linguistics, Volume 44, Issue 4 - December 2018

Social media content is changing the way people interact with each other and share information, personal messages, and opinions about situations, objects, and past experiences. Most social media texts are short online conversational posts or comments that do not contain enough information for natural language processing (NLP) tools, as they are often accompanied by non-linguistic contextual information, including meta-data (e.g., the user’s profile, the social network of the user, and their interactions with other users). Exploiting such different types of context and their interactions makes the automatic processing of social media texts a challenging research task. Indeed, simply applying traditional text mining tools is clearly sub-optimal, as, typically, these tools take into account neither the interactive dimension nor the particular nature of this data, which shares properties with both spoken and written language. This special issue contributes to a deeper understanding of the role of these interactions to process social media data from a new perspective in discourse interpretation. This introduction first provides the necessary background to understand what context is from both the linguistic and computational linguistic perspectives, then presents the most recent context-based approaches to NLP for social media. We conclude with an overview of the papers accepted in this special issue, highlighting what we believe are the future directions in processing social media texts.

pdf bib abs
The Data Challenge in Misinformation Detection: Source Reputation vs. Content Veracity
Fatemeh Torabi Asr | Maite Taboada
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)

Misinformation detection at the level of full news articles is a text classification problem. Reliably labeled data in this domain is rare. Previous work relied on news articles collected from so-called “reputable” and “suspicious” websites and labeled accordingly. We leverage fact-checking websites to collect individually-labeled news articles with regard to the veracity of their content and use this data to test the cross-domain generalization of a classifier trained on bigger text collections but labeled according to source reputation. Our results suggest that reputation-based classification is not sufficient for predicting the veracity level of the majority of news articles, and that the system performance on different test datasets depends on topic distribution. Therefore collecting well-balanced and carefully-assessed training data is a priority for developing robust misinformation detection systems.

2017

pdf bib abs
Constructive Language in News Comments
Varada Kolhatkar | Maite Taboada
Proceedings of the First Workshop on Abusive Language Online

We discuss the characteristics of constructive news comments, and present methods to identify them. First, we define the notion of constructiveness. Second, we annotate a corpus for constructiveness. Third, we explore whether available argumentation corpora can be useful to identify constructiveness in news comments. Our model trained on argumentation corpora achieves a top accuracy of 72.59% (baseline=49.44%) on our crowd-annotated test data. Finally, we examine the relation between constructiveness and toxicity. In our crowd-annotated data, 21.42% of the non-constructive comments and 17.89% of the constructive comments are toxic, suggesting that non-constructive comments are not much more toxic than constructive comments.

pdf bib
The Good, the Bad, and the Disagreement: Complex ground truth in rhetorical structure analysis
Debopam Das | Manfred Stede | Maite Taboada
Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms

pdf
Using lexical level information in discourse structures for Basque sentiment analysis
Jon Alkorta | Koldo Gojenola | Mikel Iruskieta | Maite Taboada
Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms

pdf abs
Using New York Times Picks to Identify Constructive Comments
Varada Kolhatkar | Maite Taboada
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism

We examine the extent to which we are able to automatically identify constructive online comments. We build several classifiers using New York Times Picks as positive examples and non-constructive thread comments from the Yahoo News Annotated Comments Corpus as negative examples of constructive online comments. We evaluate these classifiers on a crowd-annotated corpus containing 1,121 comments. Our best classifier achieves a top F1 score of 0.84.

pdf abs
Evaluative Language Beyond Bags of Words: Linguistic Insights and Computational Applications
Farah Benamara | Maite Taboada | Yannick Mathieu
Computational Linguistics, Volume 43, Issue 1 - April 2017

The study of evaluation, affect, and subjectivity is a multidisciplinary enterprise, including sociology, psychology, economics, linguistics, and computer science. A number of excellent computational linguistics and linguistic surveys of the field exist. Most surveys, however, do not bring the two disciplines together to show how methods from linguistics can benefit computational sentiment analysis systems. In this survey, we show how incorporating linguistic insights, discourse information, and other contextual phenomena, in combination with the statistical exploitation of data, can result in an improvement over approaches that take advantage of only one of these perspectives. We first provide a comprehensive introduction to evaluative language from both a linguistic and computational perspective. We then argue that the standard computational definition of the concept of evaluative language neglects the dynamic nature of evaluation, in which the interpretation of a given evaluation depends on linguistic and extra-linguistic contextual factors. We thus propose a dynamic definition that incorporates update functions. The update functions allow for different contextual aspects to be incorporated into the calculation of sentiment for evaluative words or expressions, and can be applied at all levels of discourse. We explore each level and highlight which linguistic aspects contribute to accurate extraction of sentiment. We end the review by outlining what we believe the future directions of sentiment analysis are, and the role that discourse and contextual information need to play.

This paper presents a freely available resource for research on handling negation and speculation in review texts. The SFU Review Corpus, consisting of 400 documents of movie, book, and consumer product reviews, was annotated at the token level with negative and speculative keywords and at the sentence level with their linguistic scope. We report statistics on corpus size and the consistency of annotations. The annotated corpus will be useful in many applications, such as document mining and sentiment analysis.

2011

pdf bib
Lexicon-Based Methods for Sentiment Analysis
Maite Taboada | Julian Brooke | Milan Tofiloski | Kimberly Voll | Manfred Stede
Computational Linguistics, Volume 37, Issue 2 - June 2011

2009

pdf
A Syntactic and Lexical-Based Discourse Segmenter
Milan Tofiloski | Julian Brooke | Maite Taboada
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf
Genre-Based Paragraph Classification for Sentiment Analysis
Maite Taboada | Julian Brooke | Manfred Stede
Proceedings of the SIGDIAL 2009 Conference

pdf
Cross-Linguistic Sentiment Analysis: From English to Spanish
Julian Brooke | Milan Tofiloski | Maite Taboada
Proceedings of the International Conference RANLP-2009

2006

pdf bib
Prosodic Correlates of Rhetorical Relations
Gabriel Murray | Maite Taboada | Steve Renals
Proceedings of the Analyzing Conversations in Text and Speech

pdf abs
Methods for Creating Semantic Orientation Dictionaries
Maite Taboada | Caroline Anthony | Kimberly Voll
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We describe and compare different methods for creating a dictionary of words with their corresponding semantic orientation (SO). We tested how well different dictionaries helped determine the SO of entire texts. To extract SO for each individual word, we used a common method based on pointwise mutual information. Mutual information between a set of seed words and the target words was calculated using two different methods: a NEAR search on the search engine Altavista (since discontinued); an AND search on Google. These two dictionaries were tested against a manually annotated dictionary of positive and negative words. The results show that all three methods are quite close, and none of them performs particularly well. We discuss possible further avenues for research, and also point out some potential problems in calculating pointwise mutual information using Google.