Abstract
We present AWATIF, a multi-genre corpus of Modern Standard Arabic (MSA) labeled for subjectivity and sentiment analysis (SSA) at the sentence level. The corpus is labeled using both regular as well as crowd sourcing methods under three different conditions with two types of annotation guidelines. We describe the sub-corpora constituting the corpus and provide examples from the various SSA categories. In the process, we present our linguistically-motivated and genre-nuanced annotation guidelines and provide evidence showing their impact on the labeling task.- Anthology ID:
- L12-1630
- Volume:
- Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
- Month:
- May
- Year:
- 2012
- Address:
- Istanbul, Turkey
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 3907–3914
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/1057_Paper.pdf
- DOI:
- Cite (ACL):
- Muhammad Abdul-Mageed and Mona Diab. 2012. AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 3907–3914, Istanbul, Turkey. European Language Resources Association (ELRA).
- Cite (Informal):
- AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis (Abdul-Mageed & Diab, LREC 2012)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/1057_Paper.pdf