K. R. Jayaram


2025

Large Language Models (LLMs) can be used to convert natural language (NL) instructions into structured business process automation (BPA) process artifacts.This paper contributes (i) FLOW-BENCH, a high quality dataset of paired NL instructions and business process definitions toevaluate NL-based BPA tools, and support research in this area, and (ii) FLOW-GEN,our approach to utilize LLMs to translate NL into an intermediate Python representation that facilitates final conversion into widely adopted business process definition languages, such as BPMN and DMN. We bootstrap FLOW-BENCH by demonstrating how it can be used to evaluate the components of FLOW-GEN across eight LLMs. We hope that FLOW-GEN and FLOW-BENCHcatalyze further research in BPA.
Developers using LLMs and LLM-based agents in their applications have provided plenty of anecdotal evidencethat in-context-learning (ICL) is fragile. In this paper, we show that in addition to the quantity and quality of examples, the order in which the in-context examples are listed in the prompt affects the output of the LLM and, consequently, their performance. While prior work has explored improving ICL through dataset-dependent techniques, we introduce , a purely inference-time, dataset-free optimization method that efficiently determines the best example order. OptiSeq leverages log probabilities of LLM-generated outputs to systematically prune the search space of possible orderings and recommend the best order(s) by distinguishing orderings that yield high levels of accuracy and those that underperform. Extensive empirical evaluation on multiple LLMs, datasets, and prompts demonstrates that OptiSeq improves accuracy by 5.5 - 10.5 percentage points across multiple tasks.