Kaustubh Adhikari


2025

pdf bib
Human ratings of LLM response generation in pair-programming dialogue
Cecilia Domingo | Paul Piwek | Svetlana Stoyanchev | Michel Wermelinger | Kaustubh Adhikari | Rama Sanand Doddipatla
Proceedings of the 18th International Natural Language Generation Conference

We take first steps in exploring whether Large Language Models (LLMs) can be adapted to dialogic learning practices, specifically pair programming — LLMs have primarily been implemented as programming assistants, not fully exploiting their dialogic potential. We used new dialogue data from real pair-programming interactions between students, prompting state-of-the-art LLMs to assume the role of a student, when generating a response that continues the real dialogue. We asked human annotators to rate human and AI responses on the criteria through which we operationalise the LLMs’ suitability for educational dialogue: Coherence, Collaborativeness, and whether they appeared human. Results show model differences, with Llama-generated responses being rated similarly to human answers on all three criteria. Thus, for at least one of the models we investigated, the LLM utterance-level response generation appears to be suitable for pair-programming dialogue.