Grammatical error detection in transcriptions of spoken English
Andrew Caines, Christian Bentz, Kate Knill, Marek Rei, Paula Buttery
Abstract
We describe the collection of transcription corrections and grammatical error annotations for the CrowdED Corpus of spoken English monologues on business topics. The corpus recordings were crowdsourced from native speakers of English and learners of English with German as their first language. The new transcriptions and annotations are obtained from different crowdworkers: we analyse the 1108 new crowdworker submissions and propose that they can be used for automatic transcription post-editing and grammatical error correction for speech. To further explore the data we train grammatical error detection models with various configurations including pre-trained and contextual word representations as input, additional features and auxiliary objectives, and extra training data from written error-annotated corpora. We find that a model concatenating pre-trained and contextual word representations as input performs best, and that additional information does not lead to further performance gains.- Anthology ID:
- 2020.coling-main.195
- Volume:
- Proceedings of the 28th International Conference on Computational Linguistics
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Editors:
- Donia Scott, Nuria Bel, Chengqing Zong
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 2144–2162
- Language:
- URL:
- https://aclanthology.org/2020.coling-main.195
- DOI:
- 10.18653/v1/2020.coling-main.195
- Cite (ACL):
- Andrew Caines, Christian Bentz, Kate Knill, Marek Rei, and Paula Buttery. 2020. Grammatical error detection in transcriptions of spoken English. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2144–2162, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Cite (Informal):
- Grammatical error detection in transcriptions of spoken English (Caines et al., COLING 2020)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2020.coling-main.195.pdf
- Data
- English Web Treebank, FCE, JFLEG