Abstract
We present a corpus of sentences from news articles that are annotated as general or specific. We employed annotators on Amazon Mechanical Turk to mark sentences from three kinds of news articles―reports on events, finance news and science journalism. We introduce the resulting corpus, with focus on annotator agreement, proportion of general/specific sentences in the articles and results for automatic classification of the two sentence types.- Anthology ID:
- L12-1384
- Volume:
- Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
- Month:
- May
- Year:
- 2012
- Address:
- Istanbul, Turkey
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 1818–1821
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/657_Paper.pdf
- DOI:
- Cite (ACL):
- Annie Louis and Ani Nenkova. 2012. A corpus of general and specific sentences from news. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1818–1821, Istanbul, Turkey. European Language Resources Association (ELRA).
- Cite (Informal):
- A corpus of general and specific sentences from news (Louis & Nenkova, LREC 2012)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/657_Paper.pdf