Alessio Sacco
2025
A Perspective on LLM Data Generation with Few-shot Examples: from Intent to Kubernetes Manifest
Antonino Angi
|
Liubov Nedoshivina
|
Alessio Sacco
|
Stefano Braghin
|
Mark Purcell
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
The advent of Large Language Models (LLMs) has transformed how complex tasks across various domains can be automated. One of the industry trends today is Agentic AI, which leverages LLMs to operate multiple tools and provide automatic configuration.In the domain of cloud computing, Agentic AI might be used, for example, with the generation of Kubernetes manifests – structured configuration files that define containerized environments.However, effectively applying LLMs to domain-specific tasks often reveals knowledge gaps that impact the accuracy and reliability of the generated output.To address these challenges, we propose KGen, a pipeline for generating K8s manifests directly from user-described intents expressed in natural language using LLMs. Our approach leverages an extensive n-shot learning analysis to choose the appropriate number of examples that can better guide the adopted models in generating the manuscripts while also looking at the computational cost. Our results validate the use of LLM in this task and show that (as expected) increasing the number of n-shot examples can improve the quality of the generated configurations when adopting more specialized models, such as Mixtral-8x7B (which uses the Mixture of Experts approach) and Prometheus-8x7B-v2.0, but (surprisingly) for more general-purpose models like Llama3-8B and Llama3-70B, it can lead to smaller number of valid K8s manifests. These results underscore the complexities of adapting LLMs for domain-specific structured generation and emphasize the need for an in-depth analysis to determine the most effective setup, also suggesting that smaller models sometimes outperform their larger counterparts for each domain-specific task.