Can a Large Language Model Keep My Secrets? A Study on LLM-Controlled Agents
Niklas Hemken, Sai Koneru, Florian Jacob, Hannes Hartenstein, Jan Niehues
Abstract
Agents controlled by Large Language Models (LLMs) can assist with natural language tasks across domains and applications when given access to confidential data.When such digital assistants interact with their potentially adversarial environment, confidentiality of the data is at stake.We investigated whether an LLM-controlled agent can, in a manner similar to humans, consider confidentiality when responding to natural language requests involving internal data.For evaluation, we created a synthetic dataset consisting of confidentiality-aware planning and deduction tasks in organizational access control.The dataset was developed from human input, LLM-generated content, and existing datasets.It includes various everyday scenarios in which access to confidential or private information is requested.We utilized our dataset to evaluate the ability to infer confidentiality-aware behavior in such scenarios by differentiating between legitimate and illegitimate access requests.We compared a prompting-based and a fine-tuning-based approach, to evaluate the performance of Llama 3 and GPT-4o-mini in this domain.In addition, we conducted a user study to establish a baseline for human evaluation performance in these tasks. We found humans reached an accuracy of up to 79%.Prompting techniques, such as chain-of-thought and few-shot prompting, yielded promising results, but still fell short of real-world applicability and do not surpass human baseline performance. However, we found that fine-tuning significantly improves the agent’s access decisions, reaching up to 98% accuracy, making it promising for future confidentiality-aware applications when data is available.- Anthology ID:
- 2025.acl-srw.49
- Volume:
- Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Jin Zhao, Mingyang Wang, Zhu Liu
- Venues:
- ACL | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 746–759
- Language:
- URL:
- https://preview.aclanthology.org/landing_page/2025.acl-srw.49/
- DOI:
- Cite (ACL):
- Niklas Hemken, Sai Koneru, Florian Jacob, Hannes Hartenstein, and Jan Niehues. 2025. Can a Large Language Model Keep My Secrets? A Study on LLM-Controlled Agents. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 746–759, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Can a Large Language Model Keep My Secrets? A Study on LLM-Controlled Agents (Hemken et al., ACL 2025)
- PDF:
- https://preview.aclanthology.org/landing_page/2025.acl-srw.49.pdf