Mark Purcell

2025

pdf bib abs
A Perspective on LLM Data Generation with Few-shot Examples: from Intent to Kubernetes Manifest
Antonino Angi | Liubov Nedoshivina | Alessio Sacco | Stefano Braghin | Mark Purcell
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)

The advent of Large Language Models (LLMs) has transformed how complex tasks across various domains can be automated. One of the industry trends today is Agentic AI, which leverages LLMs to operate multiple tools and provide automatic configuration.In the domain of cloud computing, Agentic AI might be used, for example, with the generation of Kubernetes manifests – structured configuration files that define containerized environments.However, effectively applying LLMs to domain-specific tasks often reveals knowledge gaps that impact the accuracy and reliability of the generated output.To address these challenges, we propose KGen, a pipeline for generating K8s manifests directly from user-described intents expressed in natural language using LLMs. Our approach leverages an extensive n-shot learning analysis to choose the appropriate number of examples that can better guide the adopted models in generating the manuscripts while also looking at the computational cost. Our results validate the use of LLM in this task and show that (as expected) increasing the number of n-shot examples can improve the quality of the generated configurations when adopting more specialized models, such as Mixtral-8x7B (which uses the Mixture of Experts approach) and Prometheus-8x7B-v2.0, but (surprisingly) for more general-purpose models like Llama3-8B and Llama3-70B, it can lead to smaller number of valid K8s manifests. These results underscore the complexities of adapting LLMs for domain-specific structured generation and emphasize the need for an in-depth analysis to determine the most effective setup, also suggesting that smaller models sometimes outperform their larger counterparts for each domain-specific task.

The deployment of language models in real-world applications exposes users to various risks, including hallucinations and harmful or unethical content. These challenges highlight the urgent need for robust safeguards to ensure safe and responsible AI. To address this, we introduce Granite Guardian, a suite of advanced models designed to detect and mitigate risks associated with prompts and responses, enabling seamless integration with any large language model (LLM). Unlike existing open-source solutions, our Granite Guardian models provide comprehensive coverage across a wide range of risk dimensions, including social bias, profanity, violence, sexual content, unethical behavior, jailbreaking, and hallucination-related issues such as context relevance, groundedness, and answer accuracy in retrieval-augmented generation (RAG) scenarios. Trained on a unique dataset combining diverse human annotations and synthetic data, Granite Guardian excels in identifying risks often overlooked by traditional detection systems, particularly jailbreak attempts and RAG-specific challenges. https://github.com/ibm-granite/granite-guardian

Co-authors

Venues

acl1
naacl1

Fix author