CEPOC: The Cambridge Exams Publishing Open Cloze dataset
Mariano Felice, Shiva Taslimipoor, Øistein E. Andersen, Paula Buttery
Abstract
Open cloze tests are a standard type of exercise where examinees must complete a text by filling in the gaps without any given options to choose from. This paper presents the Cambridge Exams Publishing Open Cloze (CEPOC) dataset, a collection of open cloze tests from world-renowned English language proficiency examinations. The tests in CEPOC have been expertly designed and validated using standard principles in language research and assessment. They are prepared for language learners at different proficiency levels and hence classified into different CEFR levels (A2, B1, B2, C1, C2). This resource can be a valuable testbed for various NLP tasks. We perform a complete set of experiments on three tasks: gap filling, gap prediction, and CEFR text classification. We implement transformer-based systems based on pre-trained language models to model each task and use our dataset as a test set, providing promising benchmark results.- Anthology ID:
- 2022.lrec-1.456
- Volume:
- Proceedings of the Thirteenth Language Resources and Evaluation Conference
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 4285–4290
- Language:
- URL:
- https://aclanthology.org/2022.lrec-1.456
- DOI:
- Cite (ACL):
- Mariano Felice, Shiva Taslimipoor, Øistein E. Andersen, and Paula Buttery. 2022. CEPOC: The Cambridge Exams Publishing Open Cloze dataset. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4285–4290, Marseille, France. European Language Resources Association.
- Cite (Informal):
- CEPOC: The Cambridge Exams Publishing Open Cloze dataset (Felice et al., LREC 2022)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2022.lrec-1.456.pdf
- Code
- cambridgealta/cepoc