Siyu Huo


2025

pdf bib
FLOW-BENCH: Towards Conversational Generation of Enterprise Workflows
Evelyn Duesterwald | Siyu Huo | Vatche Isahagian | K. R. Jayaram | Ritesh Kumar | Vinod Muthusamy | Punleuk Oum | Debashish Saha | Gegi Thomas | Praveen Venkateswaran
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Large Language Models (LLMs) can be used to convert natural language (NL) instructions into structured business process automation (BPA) process artifacts.This paper contributes (i) FLOW-BENCH, a high quality dataset of paired NL instructions and business process definitions toevaluate NL-based BPA tools, and support research in this area, and (ii) FLOW-GEN,our approach to utilize LLMs to translate NL into an intermediate Python representation that facilitates final conversion into widely adopted business process definition languages, such as BPMN and DMN. We bootstrap FLOW-BENCH by demonstrating how it can be used to evaluate the components of FLOW-GEN across eight LLMs. We hope that FLOW-GEN and FLOW-BENCHcatalyze further research in BPA.

2019

pdf bib
Graph Enhanced Cross-Domain Text-to-SQL Generation
Siyu Huo | Tengfei Ma | Jie Chen | Maria Chang | Lingfei Wu | Michael Witbrock
Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13)

Semantic parsing is a fundamental problem in natural language understanding, as it involves the mapping of natural language to structured forms such as executable queries or logic-like knowledge representations. Existing deep learning approaches for semantic parsing have shown promise on a variety of benchmark data sets, particularly on text-to-SQL parsing. However, most text-to-SQL parsers do not generalize to unseen data sets in different domains. In this paper, we propose a new cross-domain learning scheme to perform text-to-SQL translation and demonstrate its use on Spider, a large-scale cross-domain text-to-SQL data set. We improve upon a state-of-the-art Spider model, SyntaxSQLNet, by constructing a graph of column names for all databases and using graph neural networks to compute their embeddings. The resulting embeddings offer better cross-domain representations and SQL queries, as evidenced by substantial improvement on the Spider data set compared to SyntaxSQLNet.