Zhaochen Li
2026
OmniOData: Unleashing Small Language Models for OData Query Generation with Synthetic Data and Reinforcement Learning
Tao Bai | Zhaochen Li | Hongxin Shao | Daniel Dahlmeier
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Tao Bai | Zhaochen Li | Hongxin Shao | Daniel Dahlmeier
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Despite the success of Large Language Models (LLMs) in structured query generation, OData—a critical RESTful protocol for enterprise APIs—remains under-researched due to a lack of high-fidelity, execution-validated datasets. To bridge this gap, we introduce OmniOData, a framework that generates SynOData, the first large-scale OData corpus featuring execution-grounded queries and reasoning traces. Using this corpus, we develop OmniOData-R1 (1.5B–3B parameters), a family of models that match or surpass frontier proprietary systems, such as GPT-4o and Gemini 3, on realistic industrial benchmarks. Our results demonstrate that the synergy of execution-verified synthetic data and Reinforcement Learning (RL) effectively unlocks the latent reasoning of Small Language Models (SLMs), providing a high-performance, low-latency solution for specialized enterprise query generation.The code and data will be released under an open-source license.