ROFF - A Romanian Twitter Dataset for Offensive Language

Mihai Manolescu; Çağrı Çöltekin

ROFF - A Romanian Twitter Dataset for Offensive Language

Abstract

This paper describes the annotation process of an offensive language data set for Romanian on social media. To facilitate comparable multi-lingual research on offensive language, the annotation guidelines follow some of the recent annotation efforts for other languages. The final corpus contains 5000 micro-blogging posts annotated by a large number of volunteer annotators. The inter-annotator agreement and the initial automatic discrimination results we present are in line with earlier annotation efforts.

Anthology ID:: 2021.ranlp-1.102
Volume:: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:: September
Year:: 2021
Address:: Held Online
Editors:: Ruslan Mitkov, Galia Angelova
Venue:: RANLP
SIG:
Publisher:: INCOMA Ltd.
Note:
Pages:: 895–900
Language:
URL:: https://aclanthology.org/2021.ranlp-1.102
DOI:
Bibkey:
Cite (ACL):: Mihai Manolescu and Çağrı Çöltekin. 2021. ROFF - A Romanian Twitter Dataset for Offensive Language. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 895–900, Held Online. INCOMA Ltd..
Cite (Informal):: ROFF - A Romanian Twitter Dataset for Offensive Language (Manolescu & Çöltekin, RANLP 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/naacl-24-ws-corrections/2021.ranlp-1.102.pdf

PDF Search