Elda Weizman
2025
Irony Detection in Hebrew Documents: A Novel Dataset and an Evaluation of Neural Classification Methods
Avi Shmidman
|
Elda Weizman
|
Avishay Gerczuk
Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities
This paper focuses on the use of single words in quotation marks in Hebrew, which may or may not be an indication of irony. Because no annotated dataset yet exists for such cases, we annotate a new dataset consisting of over 4000 cases of words within quotation marks from Hebrew newspapers. On the basis of this dataset, we train and evaluate a series of seven BERT-based classifiers for irony detection, identifying the features and configurations that most effectively contribute the irony detection task. We release this novel dataset to the NLP community to promote future research and benchmarking regarding irony detection in Hebrew.