Teemu Vahtola


2021

pdf bib
Grammatical Error Generation Based on Translated Fragments
Eetu Sjöblom | Mathias Creutz | Teemu Vahtola
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

We perform neural machine translation of sentence fragments in order to create large amounts of training data for English grammatical error correction. Our method aims at simulating mistakes made by second language learners, and produces a wider range of non-native style language in comparison to a state-of-the-art baseline model. We carry out quantitative and qualitative evaluation. Our method is shown to outperform the baseline on data with a high proportion of errors.

pdf bib
Coping with Noisy Training Data Labels in Paraphrase Detection
Teemu Vahtola | Mathias Creutz | Eetu Sjöblom | Sami Itkonen
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

We present new state-of-the-art benchmarks for paraphrase detection on all six languages in the Opusparcus sentential paraphrase corpus: English, Finnish, French, German, Russian, and Swedish. We reach these baselines by fine-tuning BERT. The best results are achieved on smaller and cleaner subsets of the training sets than was observed in previous research. Additionally, we study a translation-based approach that is competitive for the languages with more limited and noisier training data.