Abstract
We report on our work-in-progress to generate a synthetic error dataset for Swedish by replicating errors observed in the authentic error annotated dataset. We analyze a small subset of authentic errors, capture regular patterns based on parts of speech, and design a set of rules to corrupt new data. We explore the approach and identify its capabilities, advantages and limitations as a way to enrich the existing collection of error-annotated data. This work focuses on word order errors, specifically those involving the placement of finite verbs in a sentence.- Anthology ID:
- 2022.bea-1.6
- Volume:
- Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022)
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, Washington
- Editors:
- Ekaterina Kochmar, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Nitin Madnani, Anaïs Tack, Victoria Yaneva, Zheng Yuan, Torsten Zesch
- Venue:
- BEA
- SIG:
- SIGEDU
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 33–38
- Language:
- URL:
- https://aclanthology.org/2022.bea-1.6
- DOI:
- 10.18653/v1/2022.bea-1.6
- Cite (ACL):
- Judit Casademont Moner and Elena Volodina. 2022. Generation of Synthetic Error Data of Verb Order Errors for Swedish. In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022), pages 33–38, Seattle, Washington. Association for Computational Linguistics.
- Cite (Informal):
- Generation of Synthetic Error Data of Verb Order Errors for Swedish (Casademont Moner & Volodina, BEA 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2022.bea-1.6.pdf
- Data
- DaLAJ