Abstract
Identifying irony in user-generated social media content has a wide range of applications; however to date Arabic content has received limited attention. To bridge this gap, this study builds a new open domain Arabic corpus annotated for irony detection. We query Twitter using irony-related hashtags to collect ironic messages, which are then manually annotated by two linguists according to our working definition of irony. Challenges which we have encountered during the annotation process reflect the inherent limitations of Twitter messages interpretation, as well as the complexity of Arabic and its dialects. Once published, our corpus will be a valuable free resource for developing open domain systems for automatic irony recognition in Arabic language and its dialects in social media text.- Anthology ID:
- 2020.lrec-1.768
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 6265–6271
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.768
- DOI:
- Cite (ACL):
- Ines Abbes, Wajdi Zaghouani, Omaima El-Hardlo, and Faten Ashour. 2020. DAICT: A Dialectal Arabic Irony Corpus Extracted from Twitter. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6265–6271, Marseille, France. European Language Resources Association.
- Cite (Informal):
- DAICT: A Dialectal Arabic Irony Corpus Extracted from Twitter (Abbes et al., LREC 2020)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2020.lrec-1.768.pdf