Juho Leinonen


2021

pdf bib
Grapheme-Based Cross-Language Forced Alignment: Results with Uralic Languages
Juho Leinonen | Sami Virpioja | Mikko Kurimo
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

Forced alignment is an effective process to speed up linguistic research. However, most forced aligners are language-dependent, and under-resourced languages rarely have enough resources to train an acoustic model for an aligner. We present a new Finnish grapheme-based forced aligner and demonstrate its performance by aligning multiple Uralic languages and English as an unrelated language. We show that even a simple non-expert created grapheme-to-phoneme mapping can result in useful word alignments.

2020

pdf bib
Service registration chatbot: collecting and comparing dialogues from AMT workers and service’s users
Luca Molteni | Mittul Singh | Juho Leinonen | Katri Leino | Mikko Kurimo | Emanuele Della Valle
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

Crowdsourcing is the go-to solution for data collection and annotation in the context of NLP tasks. Nevertheless, crowdsourced data is noisy by nature; the source is often unknown and additional validation work is performed to guarantee the dataset’s quality. In this article, we compare two crowdsourcing sources on a dialogue paraphrasing task revolving around a chatbot service. We observe that workers hired on crowdsourcing platforms produce lexically poorer and less diverse rewrites than service users engaged voluntarily. Notably enough, on dialogue clarity and optimality, the two paraphrase sources’ human-perceived quality does not differ significantly. Furthermore, for the chatbot service, the combined crowdsourced data is enough to train a transformer-based Natural Language Generation (NLG) system. To enable similar services, we also release tools for collecting data and training the dialogue-act-based transformer-based NLG module.

2018

pdf bib
New Baseline in Automatic Speech Recognition for Northern Sámi
Juho Leinonen | Peter Smit | Sami Virpioja | Mikko Kurimo
Proceedings of the Fourth International Workshop on Computational Linguistics of Uralic Languages