Irony Detection in Hebrew Documents: A Novel Dataset and an Evaluation of Neural Classification Methods

Avi Shmidman, Elda Weizman, Avishay Gerczuk


Abstract
This paper focuses on the use of single words in quotation marks in Hebrew, which may or may not be an indication of irony. Because no annotated dataset yet exists for such cases, we annotate a new dataset consisting of over 4000 cases of words within quotation marks from Hebrew newspapers. On the basis of this dataset, we train and evaluate a series of seven BERT-based classifiers for irony detection, identifying the features and configurations that most effectively contribute the irony detection task. We release this novel dataset to the NLP community to promote future research and benchmarking regarding irony detection in Hebrew.
Anthology ID:
2025.nlp4dh-1.9
Volume:
Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities
Month:
May
Year:
2025
Address:
Albuquerque, USA
Editors:
Mika Hämäläinen, Emily Öhman, Yuri Bizzoni, So Miyagawa, Khalid Alnajjar
Venues:
NLP4DH | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
91–101
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.nlp4dh-1.9/
DOI:
Bibkey:
Cite (ACL):
Avi Shmidman, Elda Weizman, and Avishay Gerczuk. 2025. Irony Detection in Hebrew Documents: A Novel Dataset and an Evaluation of Neural Classification Methods. In Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities, pages 91–101, Albuquerque, USA. Association for Computational Linguistics.
Cite (Informal):
Irony Detection in Hebrew Documents: A Novel Dataset and an Evaluation of Neural Classification Methods (Shmidman et al., NLP4DH 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.nlp4dh-1.9.pdf