Abstract
We present our system that participated in the shared task on the grammatical error correction of Ukrainian. We have implemented two approaches that make use of large pre-trained language models and synthetic data, that have been used for error correction of English as well as low-resource languages. The first approach is based on fine-tuning a large multilingual language model (mT5) in two stages: first, on synthetic data, and then on gold data. The second approach trains a (smaller) seq2seq Transformer model pre-trained on synthetic data and fine-tuned on gold data. Our mT5-based model scored first in “GEC only” track, and a very close second in the “GEC+Fluency” track. Our two key innovations are (1) finetuning in stages, first on synthetic, and then on gold data; and (2) a high-quality corruption method based on roundtrip machine translation to complement existing noisification approaches.- Anthology ID:
- 2023.unlp-1.14
- Volume:
- Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)
- Month:
- May
- Year:
- 2023
- Address:
- Dubrovnik, Croatia
- Editor:
- Mariana Romanyshyn
- Venue:
- UNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 114–120
- Language:
- URL:
- https://aclanthology.org/2023.unlp-1.14
- DOI:
- 10.18653/v1/2023.unlp-1.14
- Cite (ACL):
- Frank Palma Gomez, Alla Rozovskaya, and Dan Roth. 2023. A Low-Resource Approach to the Grammatical Error Correction of Ukrainian. In Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP), pages 114–120, Dubrovnik, Croatia. Association for Computational Linguistics.
- Cite (Informal):
- A Low-Resource Approach to the Grammatical Error Correction of Ukrainian (Palma Gomez et al., UNLP 2023)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2023.unlp-1.14.pdf