Proceedings of the First Workshop on Abusive Language Online

Zeerak Waseem, Wendy Hui Kyong Chung, Dirk Hovy, Joel Tetreault (Editors)

Anthology ID:
Vancouver, BC, Canada
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the First Workshop on Abusive Language Online
Zeerak Waseem | Wendy Hui Kyong Chung | Dirk Hovy | Joel Tetreault

pdf bib
Dimensions of Abusive Language on Twitter
Isobelle Clarke | Jack Grieve

In this paper, we use a new categorical form of multidimensional register analysis to identify the main dimensions of functional linguistic variation in a corpus of abusive language, consisting of racist and sexist Tweets. By analysing the use of a wide variety of parts-of-speech and grammatical constructions, as well as various features related to Twitter and computer-mediated communication, we discover three dimensions of linguistic variation in this corpus, which we interpret as being related to the degree of interactive, antagonistic and attitudinal language exhibited by individual Tweets. We then demonstrate that there is a significant functional difference between racist and sexist Tweets, with sexists Tweets tending to be more interactive and attitudinal than racist Tweets.

pdf bib
Constructive Language in News Comments
Varada Kolhatkar | Maite Taboada

We discuss the characteristics of constructive news comments, and present methods to identify them. First, we define the notion of constructiveness. Second, we annotate a corpus for constructiveness. Third, we explore whether available argumentation corpora can be useful to identify constructiveness in news comments. Our model trained on argumentation corpora achieves a top accuracy of 72.59% (baseline=49.44%) on our crowd-annotated test data. Finally, we examine the relation between constructiveness and toxicity. In our crowd-annotated data, 21.42% of the non-constructive comments and 17.89% of the constructive comments are toxic, suggesting that non-constructive comments are not much more toxic than constructive comments.

Rephrasing Profanity in Chinese Text
Hui-Po Su | Zhen-Jie Huang | Hao-Tsung Chang | Chuan-Jie Lin

This paper proposes a system that can detect and rephrase profanity in Chinese text. Rather than just masking detected profanity, we want to revise the input sentence by using inoffensive words while keeping their original meanings. 29 of such rephrasing rules were invented after observing sentences on real-word social websites. The overall accuracy of the proposed system is 85.56%

Deep Learning for User Comment Moderation
John Pavlopoulos | Prodromos Malakasiotis | Ion Androutsopoulos

Experimenting with a new dataset of 1.6M user comments from a Greek news portal and existing datasets of EnglishWikipedia comments, we show that an RNN outperforms the previous state of the art in moderation. A deep, classification-specific attention mechanism improves further the overall performance of the RNN. We also compare against a CNN and a word-list baseline, considering both fully automatic and semi-automatic moderation.

Class-based Prediction Errors to Detect Hate Speech with Out-of-vocabulary Words
Joan Serrà | Ilias Leontiadis | Dimitris Spathis | Gianluca Stringhini | Jeremy Blackburn | Athena Vakali

Common approaches to text categorization essentially rely either on n-gram counts or on word embeddings. This presents important difficulties in highly dynamic or quickly-interacting environments, where the appearance of new words and/or varied misspellings is the norm. A paradigmatic example of this situation is abusive online behavior, with social networks and media platforms struggling to effectively combat uncommon or non-blacklisted hate words. To better deal with these issues in those fast-paced environments, we propose using the error signal of class-based language models as input to text classification algorithms. In particular, we train a next-character prediction model for any given class and then exploit the error of such class-based models to inform a neural network classifier. This way, we shift from the ‘ability to describe’ seen documents to the ‘ability to predict’ unseen content. Preliminary studies using out-of-vocabulary splits from abusive tweet data show promising results, outperforming competitive text categorization strategies by 4-11%.

One-step and Two-step Classification for Abusive Language Detection on Twitter
Ji Ho Park | Pascale Fung

Automatic abusive language detection is a difficult but important task for online social media. Our research explores a two-step approach of performing classification on abusive language and then classifying into specific types and compares it with one-step approach of doing one multi-class classification for detecting sexist and racist languages. With a public English Twitter corpus of 20 thousand tweets in the type of sexism and racism, our approach shows a promising performance of 0.827 F-measure by using HybridCNN in one-step and 0.824 F-measure by using logistic regression in two-steps.

Legal Framework, Dataset and Annotation Schema for Socially Unacceptable Online Discourse Practices in Slovene
Darja Fišer | Tomaž Erjavec | Nikola Ljubešić

In this paper we present the legal framework, dataset and annotation schema of socially unacceptable discourse practices on social networking platforms in Slovenia. On this basis we aim to train an automatic identification and classification system with which we wish contribute towards an improved methodology, understanding and treatment of such practices in the contemporary, increasingly multicultural information society.

Abusive Language Detection on Arabic Social Media
Hamdy Mubarak | Kareem Darwish | Walid Magdy

In this paper, we present our work on detecting abusive language on Arabic social media. We extract a list of obscene words and hashtags using common patterns used in offensive and rude communications. We also classify Twitter users according to whether they use any of these words or not in their tweets. We expand the list of obscene words using this classification, and we report results on a newly created dataset of classified Arabic tweets (obscene, offensive, and clean). We make this dataset freely available for research, in addition to the list of obscene words and hashtags. We are also publicly releasing a large corpus of classified user comments that were deleted from a popular Arabic news site due to violations the site’s rules and guidelines.

Vectors for Counterspeech on Twitter
Lucas Wright | Derek Ruths | Kelly P Dillon | Haji Mohammad Saleem | Susan Benesch

A study of conversations on Twitter found that some arguments between strangers led to favorable change in discourse and even in attitudes. The authors propose that such exchanges can be usefully distinguished according to whether individuals or groups take part on each side, since the opportunity for a constructive exchange of views seems to vary accordingly.

Detecting Nastiness in Social Media
Niloofar Safi Samghabadi | Suraj Maharjan | Alan Sprague | Raquel Diaz-Sprague | Thamar Solorio

Although social media has made it easy for people to connect on a virtually unlimited basis, it has also opened doors to people who misuse it to undermine, harass, humiliate, threaten and bully others. There is a lack of adequate resources to detect and hinder its occurrence. In this paper, we present our initial NLP approach to detect invective posts as a first step to eventually detect and deter cyberbullying. We crawl data containing profanities and then determine whether or not it contains invective. Annotations on this data are improved iteratively by in-lab annotations and crowdsourcing. We pursue different NLP approaches containing various typical and some newer techniques to distinguish the use of swear words in a neutral way from those instances in which they are used in an insulting way. We also show that this model not only works for our data set, but also can be successfully applied to different data sets.

Technology Solutions to Combat Online Harassment
George Kennedy | Andrew McCollough | Edward Dixon | Alexei Bastidas | John Ryan | Chris Loo | Saurav Sahay

This work is part of a new initiative to use machine learning to identify online harassment in social media and comment streams. Online harassment goes under-reported due to the reliance on humans to identify and report harassment, reporting that is further slowed by requirements to fill out forms providing context. In addition, the time for moderators to respond and apply human judgment can take days, but response times in terms of minutes are needed in the online context. Though some of the major social media companies have been doing proprietary work in automating the detection of harassment, there are few tools available for use by the public. In addition, the amount of labeled online harassment data and availability of cross-platform online harassment datasets is limited. We present the methodology used to create a harassment dataset and classifier and the dataset used to help the system learn what harassment looks like.

Understanding Abuse: A Typology of Abusive Language Detection Subtasks
Zeerak Waseem | Thomas Davidson | Dana Warmsley | Ingmar Weber

As the body of research on abusive language detection and analysis grows, there is a need for critical consideration of the relationships between different subtasks that have been grouped under this label. Based on work on hate speech, cyberbullying, and online abuse we propose a typology that captures central similarities and differences between subtasks and discuss the implications of this for data annotation and feature construction. We emphasize the practical actions that can be taken by researchers to best approach their abusive language detection subtask of interest.

Using Convolutional Neural Networks to Classify Hate-Speech
Björn Gambäck | Utpal Kumar Sikdar

The paper introduces a deep learning-based Twitter hate-speech text classification system. The classifier assigns each tweet to one of four predefined categories: racism, sexism, both (racism and sexism) and non-hate-speech. Four Convolutional Neural Network models were trained on resp. character 4-grams, word vectors based on semantic information built using word2vec, randomly generated word vectors, and word vectors combined with character n-grams. The feature set was down-sized in the networks by max-pooling, and a softmax function used to classify tweets. Tested by 10-fold cross-validation, the model based on word2vec embeddings performed best, with higher precision than recall, and a 78.3% F-score.

Illegal is not a Noun: Linguistic Form for Detection of Pejorative Nominalizations
Alexis Palmer | Melissa Robinson | Kristy K. Phillips

This paper focuses on a particular type of abusive language, targeting expressions in which typically neutral adjectives take on pejorative meaning when used as nouns - compare ‘gay people’ to ‘the gays’. We first collect and analyze a corpus of hand-curated, expert-annotated pejorative nominalizations for four target adjectives: female, gay, illegal, and poor. We then collect a second corpus of automatically-extracted and POS-tagged, crowd-annotated tweets. For both corpora, we find support for the hypothesis that some adjectives, when nominalized, take on negative meaning. The targeted constructions are non-standard yet widely-used, and part-of-speech taggers mistag some nominal forms as adjectives. We implement a tool called NomCatcher to correct these mistaggings, and find that the same tool is effective for identifying new adjectives subject to transformation via nominalization into abusive language.