Mohan Li

2026

Context-Aware Language Understanding in Human-Robot Dialogue with LLMs
Svetlana Stoyanchev | Youmna Farag | Simon Keizer | Mohan Li | Rama Sanand Doddipatla
Proceedings of the 16th International Workshop on Spoken Dialogue System Technology

In this work, we explore the use of large language models (LLMs) as interpreters of user utterances within a human-robot language interface. A user interacting with a robot that operates in a physical environment should be able to issue commands that interrupt the robot’s actions, for example, corrections or refinement of the task. This study addresses the context-aware interpretation of user utterances, including those issued while the robot is actively engaged in task execution, exploring whether LLMs, without fine-tuning, can translate user commands into corresponding sequences of robot actions. Using an interactive multimodal interface—combining text and video—for a virtual robot operating in simulated home environments, we collect a dataset of user utterances that guide the robot through various household tasks simultaneously capturing manual interpretation when the automatic one fails. Driven by practical considerations, the collected dataset is used to compare the interpretive performance of GPT models with smaller publicly available alternatives. Our findings reveal that action-interrupting utterances pose challenges for all models. While GPT consistently outperforms the smaller models, interpretation accuracy improves across the board when relevant dynamically selected in-context learning examples are included in the prompt.

2025

pdf bib abs

Addressing the intellectual property protection challenges in commercial deployment of large language models (LLMs), existing black-box fingerprinting techniques face dual challenges from incremental fine-tuning erasure and feature-space defense due to their reliance on overfitting high-perplexity trigger patterns. We firstly reveal that, model editing in the fingerprint domain exhibits unique advantages including significantly lower false positive rates, enhanced harmlessness, and superior robustness. Building on this foundation, this paper innovatively proposes a Prefix-enhanced Fingerprint Editing Framework (PREE), which encodes copyright information into parameter offsets through dual-channel knowledge edit to achieve covert embedding of fingerprint features. Experimental results demonstrate that the proposed solution achieves the 90% trigger precision in mainstream architectures including LLaMA-3 and Qwen-2.5. The minimal parameter offset (change rate < 0.03) effectively preserves original knowledge representation while demonstrating strong robustness against incremental fine-tuning and multi-dimensional defense strategies, maintaining zero false positive rate throughout evaluations.

pdf bib abs

Conditional Multi-Stage Failure Recovery for Embodied Agents
Youmna Farag | Svetlana Stoyanchev | Mohan Li | Simon Keizer | Rama Doddipatla
Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025)

Embodied agents performing complex tasks are susceptible to execution failures, motivating the need for effective failure recovery mechanisms. In this work, we introduce a conditional multi-stage failure recovery framework that employs zero-shot chain prompting. The framework is structured into four error-handling stages, with three operating during task execution and one functioning as a post-execution reflection phase.Our approach utilises the reasoning capabilities of LLMs to analyse execution challenges within their environmental context and devise strategic solutions.We evaluate our method on the TfD benchmark of the TEACH dataset and achieve state-of-the-art performance, outperforming a baseline without error recovery by 11.5% and surpassing the strongest existing model by 19%.

Co-authors

Venues

Fix author