Otto Tarkka


2024

pdf
Automated Emotion Annotation of Finnish Parliamentary Speeches Using GPT-4
Otto Tarkka | Jaakko Koljonen | Markus Korhonen | Juuso Laine | Kristian Martiskainen | Kimmo Elo | Veronika Laippala
Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024

In this paper, we test the efficacy of using GPT-4 to annotate a dataset that is the used to train a BERT classifier for emotion analysis. Manual data annotation is often a laborious and expensive task and emotion annotation, specifically, has proved difficult even for expert annotators. We show that using GPT-4 can produce equally good results as doing data annotation manually while saving a lot of time and money. We train a BERT classifier on our automatically annotated dataset and get results that outperform a BERT classifier that is trained on machine translated data. Our paper shows how Large Language Models can be used to work with and analyse parliamentary corpora.

2021

pdf
Finnish Paraphrase Corpus
Jenna Kanerva | Filip Ginter | Li-Hsin Chang | Iiro Rastas | Valtteri Skantsi | Jemina Kilpeläinen | Hanna-Mari Kupari | Jenna Saarni | Maija Sevón | Otto Tarkka
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

In this paper, we introduce the first fully manually annotated paraphrase corpus for Finnish containing 53,572 paraphrase pairs harvested from alternative subtitles and news headings. Out of all paraphrase pairs in our corpus 98% are manually classified to be paraphrases at least in their given context, if not in all contexts. Additionally, we establish a manual candidate selection method and demonstrate its feasibility in high quality paraphrase selection in terms of both cost and quality.