Artur Janicki


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
ClickAgent: Enhancing UI Location Capabilities of Autonomous Agents
Jakub Hoscilowicz | Artur Janicki
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue

With the growing reliance on digital devices with graphical user interfaces (GUIs) like computers and smartphones, the demand for smart voice assistants has grown significantly. While multimodal large language models (MLLM) like GPT-4V excel in many areas, they struggle with GUI interactions, limiting their effectiveness in automating everyday tasks. In this work, we introduce ClickAgent, a novel framework for building autonomous agents. ClickAgent combines MLLM-driven reasoning and action planning with a separate UI location model that identifies relevant UI elements on the screen. This approach addresses a key limitation of current MLLMs: their inability to accurately locate UI elements. Evaluations conducted using both an Android emulator and a real smartphone show that ClickAgent outperforms other autonomous agents (DigiRL, CogAgent, AppAgent) on the AITW benchmark.