Yuya Asano

2025

pdf bib abs
Exploring task formulation strategies to evaluate the coherence of classroom discussions with GPT-4o
Yuya Asano | Beata Beigman Klebanov | Jamie Mikeska
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)

Engaging students in a coherent classroom discussion is one aspect of high-quality instruction and is an important skill that requires practice to acquire. With the goal of providing teachers with formative feedback on their classroom discussions, we investigate automated means for evaluating teachers’ ability to lead coherent discussions in simulated classrooms. While prior work has shown the effectiveness of large language models (LLMs) in assessing the coherence of relatively short texts, it has also found that LLMs struggle when assessing instructional quality. We evaluate the generalizability of task formulation strategies for assessing the coherence of classroom discussions across different subject domains using GPT-4o and discuss how these formulations address the previously reported challenges—the overestimation of instructional quality and the inability to extract relevant parts of discussions. Finally, we report lack of generalizability across domains and the misalignment with humans in the use of evidence from discussions as remaining challenges.

pdf bib abs
Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI
Yuya Asano | Sabit Hassan | Paras Sharma | Anthony B. Sicilia | Katherine Atwell | Diane Litman | Malihe Alikhani
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track

General-purpose automatic speech recognition (ASR) systems do not always perform well in goal-oriented dialogue. Existing ASR correction methods rely on prior user data or named entities. We extend correction to tasks that have no prior user data and exhibit linguistic flexibility such as lexical and syntactic variations. We propose a novel context augmentation with a large language model and a ranking strategy that incorporates contextual information from the dialogue states of a goal-oriented conversational AI and its tasks. Our method ranks (1) n-best ASR hypotheses by their lexical and semantic similarity with context and (2) context by phonetic correspondence with ASR hypotheses. Evaluated in home improvement and cooking domains with real-world users, our method improves recall and F1 of correction by 34% and 16%, respectively, while maintaining precision and false positive rate. Users rated .8-1 point (out of 5) higher when our correction method worked properly, with no decrease due to false positives.

pdf bib abs
Can LLMs simulate the same correct solutions to free-response math problems as real students?
Yuya Asano | Diane Litman | Erin Walker
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) have emerged as powerful tools for developing educational systems. While previous studies have explored modeling student mistakes, a critical gap remains in understanding whether LLMs can generate correct solutions that represent student responses to free-response problems. In this paper, we compare the distribution of solutions produced by four LLMs (one proprietary, two open-sourced general, and one open-sourced math models) with various sampling and prompting techniques and those generated by students, using conversations where students teach math problems to a conversational robot. Our study reveals discrepancies between the correct solutions produced by LLMs and by students. We discuss the practical implications of these findings for the design and evaluation of LLM-supported educational systems.

2022

pdf bib abs
Comparison of Lexical Alignment with a Teachable Robot in Human-Robot and Human-Human-Robot Interactions
Yuya Asano | Diane Litman | Mingzhi Yu | Nikki Lobczowski | Timothy Nokes-Malach | Adriana Kovashka | Erin Walker
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Speakers build rapport in the process of aligning conversational behaviors with each other. Rapport engendered with a teachable agent while instructing domain material has been shown to promote learning. Past work on lexical alignment in the field of education suffers from limitations in both the measures used to quantify alignment and the types of interactions in which alignment with agents has been studied. In this paper, we apply alignment measures based on a data-driven notion of shared expressions (possibly composed of multiple words) and compare alignment in one-on-one human-robot (H-R) interactions with the H-R portions of collaborative human-human-robot (H-H-R) interactions. We find that students in the H-R setting align with a teachable robot more than in the H-H-R setting and that the relationship between lexical alignment and rapport is more complex than what is predicted by previous theoretical and empirical work.

Co-authors

Beata Beigman Klebanov 1

Adriana Kovashka 1

Nikki Lobczowski 1

Jamie Mikeska 1

Timothy Nokes-Malach 1

Paras Sharma 1

Anthony B. Sicilia 1

Mingzhi Yu 1

Venues

Fix author