Offensive Language Detection Using Brown Clustering

Zuoyu Tian, Sandra Kübler


Abstract
In this study, we investigate the use of Brown clustering for offensive language detection. Brown clustering has been shown to be of little use when the task involves distinguishing word polarity in sentiment analysis tasks. In contrast to previous work, we train Brown clusters separately on positive and negative sentiment data, but then combine the information into a single complex feature per word. This way of representing words results in stable improvements in offensive language detection, when used as the only features or in combination with words or character n-grams. Brown clusters add important information, even when combined with words or character n-grams or with standard word embeddings in a convolutional neural network. However, we also found different trends between the two offensive language data sets we used.
Anthology ID:
2020.lrec-1.625
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5079–5087
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.625
DOI:
Bibkey:
Cite (ACL):
Zuoyu Tian and Sandra Kübler. 2020. Offensive Language Detection Using Brown Clustering. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5079–5087, Marseille, France. European Language Resources Association.
Cite (Informal):
Offensive Language Detection Using Brown Clustering (Tian & Kübler, LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/2020.lrec-1.625.pdf