2025
pdf
bib
abs
ToolReAGt: Tool Retrieval for LLM-based Complex Task Solution via Retrieval Augmented Generation
Norbert Braunschweiler
|
Rama Doddipatla
|
Tudor-catalin Zorila
Proceedings of the 3rd Workshop on Towards Knowledgeable Foundation Models (KnowFM)
Artificial intelligence agents when deployed to solve complex problems, need to first decompose the task into smaller manageable sub-tasks, and further associate tools if one is required to solve the sub-task. If the size of the set of tools to chose from is large, a retrieval system is usually employed to narrow down the tool choices before the LLM can proceed with associating tools to the sub-tasks. This paper focuses on the retrieval problem to identify the set of relevant tools to solve a complex task given a large pool of tools to chose from using retrieval augmented generation (RAG) and we refer to it as ToolReAGT. The proposed approach employs ReAct prompting to perform the retrieval in an iterative fashion to first identify if a tool is required and then associate one or more tools for each sub-task. This deviates from conventional RAG where an n-best list of tools are identified given the complex task directly. Experiments are presented on the UltraTool benchmark corpus with 1000 complex tasks and over 2000 tools to select from. A conventional RAG-system is established as baseline and compared to the ToolReAGt approach, resulting in an 8.9% improved retrieval accuracy score recall@5.
pdf
bib
abs
Conditional Multi-Stage Failure Recovery for Embodied Agents
Youmna Farag
|
Svetlana Stoyanchev
|
Mohan Li
|
Simon Keizer
|
Rama Doddipatla
Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025)
Embodied agents performing complex tasks are susceptible to execution failures, motivating the need for effective failure recovery mechanisms. In this work, we introduce a conditional multi-stage failure recovery framework that employs zero-shot chain prompting. The framework is structured into four error-handling stages, with three operating during task execution and one functioning as a post-execution reflection phase.Our approach utilises the reasoning capabilities of LLMs to analyse execution challenges within their environmental context and devise strategic solutions.We evaluate our method on the TfD benchmark of the TEACH dataset and achieve state-of-the-art performance, outperforming a baseline without error recovery by 11.5% and surpassing the strongest existing model by 19%.
2023
pdf
bib
abs
Evaluating Large Language Models for Document-grounded Response Generation in Information-Seeking Dialogues
Norbert Braunschweiler
|
Rama Doddipatla
|
Simon Keizer
|
Svetlana Stoyanchev
Proceedings of the 1st Workshop on Taming Large Language Models: Controllability in the era of Interactive Assistants!
In this paper, we investigate the use of large language models (LLMs) like ChatGPT for document-grounded response generation in the context of information-seeking dialogues. For evaluation, we use the MultiDoc2Dial corpus of task-oriented dialogues in four social service domains previously used in the DialDoc 2022 Shared Task. Information-seeking dialogue turns are grounded in multiple documents providing relevant information. We generate dialogue completion responses by prompting a ChatGPT model, using two methods: Chat-Completion and LlamaIndex. ChatCompletion uses knowledge from ChatGPT model pre-training while LlamaIndex also extracts relevant information from documents. Observing that document-grounded response generation via LLMs cannot be adequately assessed by automatic evaluation metrics as they are significantly more verbose, we perform a human evaluation where annotators rate the output of the shared task winning system, the two ChatGPT variants outputs, and human responses. While both ChatGPT variants are more likely to include information not present in the relevant segments, possibly including a presence of hallucinations, they are rated higher than both the shared task winning system and human responses.
2014
pdf
bib
abs
The USFD SLT system for IWSLT 2014
Raymond W. M. Ng
|
Mortaza Doulaty
|
Rama Doddipatla
|
Wilker Aziz
|
Kashif Shah
|
Oscar Saz
|
Madina Hasan
|
Ghada AlHaribi
|
Lucia Specia
|
Thomas Hain
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign
The University of Sheffield (USFD) participated in the International Workshop for Spoken Language Translation (IWSLT) in 2014. In this paper, we will introduce the USFD SLT system for IWSLT. Automatic speech recognition (ASR) is achieved by two multi-pass deep neural network systems with adaptation and rescoring techniques. Machine translation (MT) is achieved by a phrase-based system. The USFD primary system incorporates state-of-the-art ASR and MT techniques and gives a BLEU score of 23.45 and 14.75 on the English-to-French and English-to-German speech-to-text translation task with the IWSLT 2014 data. The USFD contrastive systems explore the integration of ASR and MT by using a quality estimation system to rescore the ASR outputs, optimising towards better translation. This gives a further 0.54 and 0.26 BLEU improvement respectively on the IWSLT 2012 and 2014 evaluation data.