OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification

Sowmya Vajjala, Ivana Lučić


Abstract
This paper describes the collection and compilation of the OneStopEnglish corpus of texts written at three reading levels, and demonstrates its usefulness for through two applications - automatic readability assessment and automatic text simplification. The corpus consists of 189 texts, each in three versions (567 in total). The corpus is now freely available under a CC by-SA 4.0 license and we hope that it would foster further research on the topics of readability assessment and text simplification.
Anthology ID:
W18-0535
Volume:
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Editors:
Joel Tetreault, Jill Burstein, Ekaterina Kochmar, Claudia Leacock, Helen Yannakoudakis
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
297–304
Language:
URL:
https://aclanthology.org/W18-0535
DOI:
10.18653/v1/W18-0535
Bibkey:
Cite (ACL):
Sowmya Vajjala and Ivana Lučić. 2018. OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification. In Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 297–304, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification (Vajjala & Lučić, BEA 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/W18-0535.pdf
Data
OneStopEnglish