Juho Leinonen
2026
Retrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solving in AI Education
Mragisha Jain | Tirth Bhatt | Griffin Pitts | Aum Pandya | Peter Brusilovsky | Narges Norouzi | Arto Hellas | Juho Leinonen | Bita Akram
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Mragisha Jain | Tirth Bhatt | Griffin Pitts | Aum Pandya | Peter Brusilovsky | Narges Norouzi | Arto Hellas | Juho Leinonen | Bita Akram
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Students learning algorithms often need support as they interpret traces, debug reasoning errors, and apply procedures across unfamiliar problem instances. In this paper, we present KITE (Knowledge-Informed Tutoring Engine), a Retrieval-Augmented Generation (RAG)-based intelligent tutoring system designed to serve as a classroom teaching assistant for algorithmic reasoning and problem-solving tasks. KITE uses an intent-aware Socratic response strategy to tailor support to different student needs, responding with targeted hints, guiding questions, and progressive scaffolding intended to strengthen students’ algorithmic problem-solving ability. To keep responses aligned with course content, KITE uses a multimodal RAG pipeline that retrieves relevant information from course materials. We evaluate KITE using three forms of assessment: RAGAs-based metrics for response grounding and quality, expert evaluation of pedagogical quality, and a simulated student pipeline in which a weaker language model interacts with KITE across two-turn dialogues and produces revised answers after receiving feedback. Results indicate that KITE produces contextually grounded and pedagogically appropriate responses. Further, using simulated students, KITE’s feedback helped the student models produce more accurate follow-up responses on procedural and tracing questions, suggesting that its scaffolding can support algorithmic problem-solving. This work contributes a tutoring architecture and an evaluation approach for assessing retrieval-grounded explanations and scaffolded problem-solving feedback.
The Effects of Structured LLM-Generated Feedback on Programming Assignment Performance
Tsvetomila Mihaylova | Evanfiya Logacheva | Arto Hellas | Jing Fan | Francisco Castro | Bita Akram | Narges Norouzi | Peter Brusilovsky | Juho Leinonen
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Tsvetomila Mihaylova | Evanfiya Logacheva | Arto Hellas | Jing Fan | Francisco Castro | Bita Akram | Narges Norouzi | Peter Brusilovsky | Juho Leinonen
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
When programming students encounter errors in their code, compiler messages or static analysis output often provide limited guidance, particularly for novice programmers. Personalized feedback from instructors can be effective but does not scale well. Recent advances in large language models (LLMs) enable automated feedback generation at scale.This study examines whether LLM-generated feedback with different levels of guidance is associated with differences in students’ problem-solving behavior. We analyze effects on time to solution and number of attempts, and examine whether these effects differ by programming experience. We design three feedback types and compare them to a baseline in which students receive only compiler error messages. Results from an online programming course show that LLM-generated feedback is associated with faster time to solution compared to the no-feedback baseline, with less guided feedback showing slightly stronger effects. Overall, the findings suggest that feedback structure plays an important role in how students progress toward correct solutions and motivate further work on adaptive feedback designs and longer-term learning outcomes.
2022
Semiautomatic Speech Alignment for Under-Resourced Languages
Juho Leinonen | Niko Partanen | Sami Virpioja | Mikko Kurimo
Proceedings of the Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia within the 13th Language Resources and Evaluation Conference
Juho Leinonen | Niko Partanen | Sami Virpioja | Mikko Kurimo
Proceedings of the Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia within the 13th Language Resources and Evaluation Conference
Cross-language forced alignment is a solution for linguists who create speech corpora for very low-resource languages. However, cross-language is an additional challenge making a complex task, forced alignment, even more difficult. We study how linguists can impart domain expertise to the tasks to increase the performance of automatic forced aligners while keeping the time effort still lower than with manual forced alignment. First, we show that speech recognizers have a clear bias in starting the word later than a human annotator, which results in micro-pauses in the results that do not exist in manual alignments, and study which is the best way to automatically remove these silences. Second, we ask the linguists to simplify the task by splitting long interview audios into shorter lengths by providing some manually aligned segments and evaluating the results of this process. We also study how correlated source language performance is to target language performance, since often it is an easier task to find a better source model than to adapt to the target language.
2021
Grapheme-Based Cross-Language Forced Alignment: Results with Uralic Languages
Juho Leinonen | Sami Virpioja | Mikko Kurimo
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
Juho Leinonen | Sami Virpioja | Mikko Kurimo
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
Forced alignment is an effective process to speed up linguistic research. However, most forced aligners are language-dependent, and under-resourced languages rarely have enough resources to train an acoustic model for an aligner. We present a new Finnish grapheme-based forced aligner and demonstrate its performance by aligning multiple Uralic languages and English as an unrelated language. We show that even a simple non-expert created grapheme-to-phoneme mapping can result in useful word alignments.
2020
Service registration chatbot: collecting and comparing dialogues from AMT workers and service’s users
Luca Molteni | Mittul Singh | Juho Leinonen | Katri Leino | Mikko Kurimo | Emanuele Della Valle
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
Luca Molteni | Mittul Singh | Juho Leinonen | Katri Leino | Mikko Kurimo | Emanuele Della Valle
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
Crowdsourcing is the go-to solution for data collection and annotation in the context of NLP tasks. Nevertheless, crowdsourced data is noisy by nature; the source is often unknown and additional validation work is performed to guarantee the dataset’s quality. In this article, we compare two crowdsourcing sources on a dialogue paraphrasing task revolving around a chatbot service. We observe that workers hired on crowdsourcing platforms produce lexically poorer and less diverse rewrites than service users engaged voluntarily. Notably enough, on dialogue clarity and optimality, the two paraphrase sources’ human-perceived quality does not differ significantly. Furthermore, for the chatbot service, the combined crowdsourced data is enough to train a transformer-based Natural Language Generation (NLG) system. To enable similar services, we also release tools for collecting data and training the dialogue-act-based transformer-based NLG module.