iSarcasm: A Dataset of Intended Sarcasm

Silviu Oprea, Walid Magdy


Abstract
We consider the distinction between intended and perceived sarcasm in the context of textual sarcasm detection. The former occurs when an utterance is sarcastic from the perspective of its author, while the latter occurs when the utterance is interpreted as sarcastic by the audience. We show the limitations of previous labelling methods in capturing intended sarcasm and introduce the iSarcasm dataset of tweets labeled for sarcasm directly by their authors. Examining the state-of-the-art sarcasm detection models on our dataset showed low performance compared to previously studied datasets, which indicates that these datasets might be biased or obvious and sarcasm could be a phenomenon under-studied computationally thus far. By providing the iSarcasm dataset, we aim to encourage future NLP research to develop methods for detecting sarcasm in text as intended by the authors of the text, not as labeled under assumptions that we demonstrate to be sub-optimal.
Anthology ID:
2020.acl-main.118
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1279–1289
Language:
URL:
https://aclanthology.org/2020.acl-main.118
DOI:
10.18653/v1/2020.acl-main.118
Bibkey:
Cite (ACL):
Silviu Oprea and Walid Magdy. 2020. iSarcasm: A Dataset of Intended Sarcasm. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1279–1289, Online. Association for Computational Linguistics.
Cite (Informal):
iSarcasm: A Dataset of Intended Sarcasm (Oprea & Magdy, ACL 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.acl-main.118.pdf
Dataset:
 2020.acl-main.118.Dataset.zip
Video:
 http://slideslive.com/38929208
Data
iSarcasmSARC