Yujin Takahashi

2022

In grammatical error correction (GEC), automatic evaluation is considered as an important factor for research and development of GEC systems. Previous studies on automatic evaluation have shown that quality estimation models built from datasets with manual evaluation can achieve high performance in automatic evaluation of English GEC. However, quality estimation models have not yet been studied in Japanese, because there are no datasets for constructing quality estimation models. In this study, therefore, we created a quality estimation dataset with manual evaluation to build an automatic evaluation model for Japanese GEC. By building a quality estimation model using this dataset and conducting a meta-evaluation, we verified the usefulness of the quality estimation model for Japanese GEC.

pdf abs
ProQE: Proficiency-wise Quality Estimation dataset for Grammatical Error Correction
Yujin Takahashi | Masahiro Kaneko | Masato Mita | Mamoru Komachi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This study investigates how supervised quality estimation (QE) models of grammatical error correction (GEC) are affected by the learners’ proficiency with the data. QE models for GEC evaluations in prior work have obtained a high correlation with manual evaluations. However, when functioning in a real-world context, the data used for the reported results have limitations because prior works were biased toward data by learners with relatively high proficiency levels. To address this issue, we created a QE dataset that includes multiple proficiency levels and explored the necessity of performing proficiency-wise evaluation for QE of GEC. Our experiments demonstrated that differences in evaluation dataset proficiency affect the performance of QE models, and proficiency-wise evaluation helps create more robust models.

2020

pdf abs
Grammatical Error Correction Using Pseudo Learner Corpus Considering Learner’s Error Tendency
Yujin Takahashi | Satoru Katsumata | Mamoru Komachi
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Recently, several studies have focused on improving the performance of grammatical error correction (GEC) tasks using pseudo data. However, a large amount of pseudo data are required to train an accurate GEC model. To address the limitations of language and computational resources, we assume that introducing pseudo errors into sentences similar to those written by the language learners is more efficient, rather than incorporating random pseudo errors into monolingual data. In this regard, we study the effect of pseudo data on GEC task performance using two approaches. First, we extract sentences that are similar to the learners’ sentences from monolingual data. Second, we generate realistic pseudo errors by considering error types that learners often make. Based on our comparative results, we observe that F0.5 scores for the Russian GEC task are significantly improved.

Co-authors

Tosho Hirasawa 1

Michitaka Nakatsuji 1

Masahiro Kaneko 1

Satoru Katsumata 1

Venues

lrec2
acl1