Abstract
We present the NewSoMe (News and Social Media) Corpus, a set of subcorpora with annotations on opinion expressions across genres (news reports, blogs, product reviews and tweets) and covering multiple languages (English, Spanish, Catalan and Portuguese). NewSoMe is the result of an effort to increase the opinion corpus resources available in languages other than English, and to build a unifying annotation framework for analyzing opinion in different genres, including controlled text, such as news reports, as well as different types of user generated contents (UGC). Given the broad design of the resource, most of the annotation effort were carried out resorting to crowdsourcing platforms: Amazon Mechanical Turk and CrowdFlower. This created an excellent opportunity to research on the feasibility of crowdsourcing methods for annotating big amounts of text in different languages.- Anthology ID:
- L14-1306
- Volume:
- Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
- Month:
- May
- Year:
- 2014
- Address:
- Reykjavik, Iceland
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 2229–2236
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/350_Paper.pdf
- DOI:
- Cite (ACL):
- Roser Saurí, Judith Domingo, and Toni Badia. 2014. The NewSoMe Corpus: A Unifying Opinion Annotation Framework across Genres and in Multiple Languages. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 2229–2236, Reykjavik, Iceland. European Language Resources Association (ELRA).
- Cite (Informal):
- The NewSoMe Corpus: A Unifying Opinion Annotation Framework across Genres and in Multiple Languages (Saurí et al., LREC 2014)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/350_Paper.pdf