A World CLASSE Student Summary Corpus
Scott Crossley, Perpetual Baffour, Mihai Dascalu, Stefan Ruseti
Abstract
This paper introduces the Common Lit Augmented Student Summary Evaluation (CLASSE) corpus. The corpus comprises 11,213 summaries written over six prompts by students in grades 3-12 while using the CommonLit website. Each summary was scored by expert human raters on analytic features related to main points, details, organization, voice, paraphrasing, and language beyond the source text. The human scores were aggregated into two component scores related to content and wording. The final corpus was the focus of a Kaggle competition hosted in late 2022 and completed in 2023 in which over 2,000 teams participated. The paper includes a baseline scoring model for the corpus based on a Large Language Model (Longformer model). The paper also provides an overview of the winning models from the Kaggle competition.- Anthology ID:
- 2024.bea-1.9
- Volume:
- Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Ekaterina Kochmar, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, Zheng Yuan
- Venue:
- BEA
- SIG:
- SIGEDU
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 99–107
- Language:
- URL:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.bea-1.9/
- DOI:
- Cite (ACL):
- Scott Crossley, Perpetual Baffour, Mihai Dascalu, and Stefan Ruseti. 2024. A World CLASSE Student Summary Corpus. In Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024), pages 99–107, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- A World CLASSE Student Summary Corpus (Crossley et al., BEA 2024)
- PDF:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.bea-1.9.pdf