Martin Vainikko
2025
Paragraph-level Error Correction and Explanation Generation: Case Study for Estonian
Martin Vainikko
|
Taavi Kamarik
|
Karina Kert
|
Krista Liin
|
Silvia Maine
|
Kais Allkivi
|
Annekatrin Kaivapalu
|
Mark Fishel
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
We present a case study on building task-specific models for grammatical error correction and explanation generation tailored to learners of Estonian. Our approach handles whole paragraphs instead of sentences and leverages prompting proprietary large language models for generating synthetic training data, addressing the limited availability of error correction data and the complete absence of correction justification/explanation data in Estonian. We describe the chosen approach and pipeline and provide technical details for the experimental part. The final outcome is a set of open-weight models, which are released with a permissive license along with the generated synthetic error correction and explanation data.
2024
To Err Is Human, but Llamas Can Learn It Too
Agnes Luhtaru
|
Taido Purason
|
Martin Vainikko
|
Maksym Del
|
Mark Fishel
Findings of the Association for Computational Linguistics: EMNLP 2024
This study explores enhancing grammatical error correction (GEC) through automatic error generation (AEG) using language models (LMs). Specifically, we fine-tune Llama 2 LMs for error generation and find that this approach yields synthetic errors akin to human errors. Next, we train GEC Llama models using these artificial errors and outperform previous state-of-the-art error correction models, with gains ranging between 0.8 and 6 F0.5 points across all tested languages (German, Ukrainian, and Estonian). Moreover, we demonstrate that generating errors by fine-tuning smaller sequence-to-sequence models and prompting large commercial LMs (GPT3.5 and GPT4) also results in synthetic errors beneficially affecting error generation models. We openly release trained models for error generation and correction as well as all the synthesized error datasets for the covered languages.
Search
Fix author
Co-authors
- Mark Fishel 2
- Kais Allkivi 1
- Maksym Del 1
- Annekatrin Kaivapalu 1
- Taavi Kamarik 1
- show all...