Abstract
This paper describes the annotation process of an offensive language data set for Romanian on social media. To facilitate comparable multi-lingual research on offensive language, the annotation guidelines follow some of the recent annotation efforts for other languages. The final corpus contains 5000 micro-blogging posts annotated by a large number of volunteer annotators. The inter-annotator agreement and the initial automatic discrimination results we present are in line with earlier annotation efforts.- Anthology ID:
- 2021.ranlp-1.102
- Volume:
- Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
- Month:
- September
- Year:
- 2021
- Address:
- Held Online
- Editors:
- Ruslan Mitkov, Galia Angelova
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 895–900
- Language:
- URL:
- https://aclanthology.org/2021.ranlp-1.102
- DOI:
- Cite (ACL):
- Mihai Manolescu and Çağrı Çöltekin. 2021. ROFF - A Romanian Twitter Dataset for Offensive Language. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 895–900, Held Online. INCOMA Ltd..
- Cite (Informal):
- ROFF - A Romanian Twitter Dataset for Offensive Language (Manolescu & Çöltekin, RANLP 2021)
- PDF:
- https://preview.aclanthology.org/naacl-24-ws-corrections/2021.ranlp-1.102.pdf