Votter Corpus: A Corpus of Social Polling Language

Nathan Green, Septina Dian Larasati


Abstract
The Votter Corpus is a new annotated corpus of social polling questions and answers. The Votter Corpus is novel in its use of the mobile application format and novel in its coverage of specific demographics. With over 26,000 polls and close to 1 millions votes, the Votter Corpus covers everyday question and answer language, primarily for users who are female and between the ages of 13-24. The corpus is annotated by topic and by popularity of particular answers. The corpus contains many unique characteristics such as emoticons, common mobile misspellings, and images associated with many of the questions. The corpus is a collection of questions and answers from The Votter App on the Android operating system. Data is created solely on this mobile platform which differs from most social media corpora. The Votter Corpus is being made available online in XML format for research and non-commercial use. The Votter android app can be downloaded for free in most android app stores.
Anthology ID:
L14-1098
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3693–3697
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1143_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Nathan Green and Septina Dian Larasati. 2014. Votter Corpus: A Corpus of Social Polling Language. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3693–3697, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Votter Corpus: A Corpus of Social Polling Language (Green & Larasati, LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1143_Paper.pdf