Kranti Chalamalasetti

Also published as: Chalamalasetti Kranti

2025

pdf bib abs
clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations
Chalamalasetti Kranti | Sherzod Hakimov | David Schlangen
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue

The emergence of instruction-tuned large language models (LLMs) has advanced the field of dialogue systems, enabling both realistic user simulations and robust multi-turn conversational agents. However, existing research often evaluates these components in isolation, either focusing on a single user simulator or a specific system design, limiting the generalisability of insights across architectures and configurations. In this work, we propose clem:todd (chat-optimized LLMs for task-oriented dialogue systems development), a flexible framework for systematically evaluating dialogue systems under consistent conditions. clem:todd enables detailed benchmarking across combinations of user simulators and dialogue systems, whether existing models from literature or newly developed ones. To the best of our knowledge, clem:todd is the first evaluation framework for task-oriented dialogue systems that supports plug-and-play integration and ensures uniform datasets, evaluation metrics, and computational constraints. We showcase clem:todd’s flexibility by re-evaluating existing task-oriented dialogue systems within this unified setup and integrating three newly proposed dialogue systems into the same evaluation pipeline. Our results provide actionable insights into how architecture, scale, and prompting strategies affect dialogue performance, offering practical guidance for building efficient and effective conversational AI systems.

pdf bib abs
Conversational Collaborative Robots
Chalamalasetti Kranti
Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems

Spoken dialogue systems (SDSs) aims to enable natural, interactive and collaborative conversations. My research interest lies in leveraging these situated collaborative conversations to teach new concepts (skills) to collaborative robots (cobots). These cobots, when operating in manufacturing environments such as assembly lines, are envisioned to converse with humans, reach common ground, and learn new skills in one shot without the need for multiple demonstrations. Unlike SDSs in consumer domains, these cobot-based systems must handle conversations in noisy, time-sensitive industrial settings. Motivated by these challenges, my research focuses on building collaborative dialogue systems capable of integrating conversational programming to translate situated dialogue into modular programs, knowing when to ask for clarifications, and adapting the program based on corrections.

2024

pdf bib abs
Retrieval-Augmented Code Generation for Situated Action Generation: A Case Study on Minecraft
Chalamalasetti Kranti | Sherzod Hakimov | David Schlangen
Findings of the Association for Computational Linguistics: EMNLP 2024

In the Minecraft Collaborative Building Task, two players collaborate: an Architect (A) provides instructions to a Builder (B) to assemble a specified structure using 3D blocks. In this work, we investigate the use of large language models (LLMs) to predict the sequence of actions taken by the Builder. Leveraging LLMs’ in-context learning abilities, we use few-shot prompting techniques, that significantly improve performance over baseline methods. Additionally, we present a detailed analysis of the gaps in performance for future work.

2023

pdf bib abs
clembench: Using Game Play to Evaluate Chat-Optimized Language Models as Conversational Agents
Kranti Chalamalasetti | Jana Götze | Sherzod Hakimov | Brielen Madureira | Philipp Sadler | David Schlangen
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Recent work has proposed a methodology for the systematic evaluation of “Situated Language Understanding Agents” — agents that operate in rich linguistic and non-linguistic contexts — through testing them in carefully constructed interactive settings. Other recent work has argued that Large Language Models (LLMs), if suitably set up, can be understood as (simulators of) such agents. A connection suggests itself, which this paper explores: Can LLMs be evaluated meaningfully by exposing them to constrained game-like settings that are built to challenge specific capabilities? As a proof of concept, this paper investigates five interaction settings, showing that current chat-optimised LLMs are, to an extent, capable of following game-play instructions. Both this capability and the quality of the game play, measured by how well the objectives of the different games are met, follows the development cycle, with newer models generally performing better. The metrics even for the comparatively simple example games are far from being saturated, suggesting that the proposed instrument will remain to have diagnostic value.

2020

pdf bib abs
EmpLite: A Lightweight Sequence Labeling Model for Emphasis Selection of Short Texts
Vibhav Agarwal | Sourav Ghosh | Kranti Chalamalasetti | Bharath Challa | Sonal Kumari | Harshavardhana | Barath Raj Kandur Raja
Proceedings of the Workshop on Joint NLP Modelling for Conversational AI @ ICON 2020

Word emphasis in textual content aims at conveying the desired intention by changing the size, color, typeface, style (bold, italic, etc.), and other typographical features. The emphasized words are extremely helpful in drawing the readers’ attention to specific information that the authors wish to emphasize. However, performing such emphasis using a soft keyboard for social media interactions is time-consuming and has an associated learning curve. In this paper, we propose a novel approach to automate the emphasis word detection on short written texts. To the best of our knowledge, this work presents the first lightweight deep learning approach for smartphone deployment of emphasis selection. Experimental results show that our approach achieves comparable accuracy at a much lower model size than existing models. Our best lightweight model has a memory footprint of 2.82 MB with a matching score of 0.716 on SemEval-2020 public benchmark dataset.