Grammar Pruning: Enabling Low-Latency Zero-Shot Task-Oriented Language Models for Edge AI
Octavian Alexandru Trifan, Jason Lee Weber, Marc Titus Trifan, Alexandru Nicolau, Alexander Veidenbaum
Abstract
Edge deployment of task-oriented semantic parsers demands high accuracy under tight latency and memory budgets. We present Grammar Pruning, a lightweight zero-shot framework that begins with a user-defined schema of API calls and couples a rule-based entity extractor with an iterative grammar-constrained decoder: extracted items dynamically prune the context-free grammar, limiting generation to only those intents, slots, and values that remain plausible at each step. This aggressive search-space reduction both reduces hallucinations and slashes decoding time. On the adapted FoodOrdering, APIMixSNIPS, and APIMixATIS benchmarks, Grammar Pruning with small language models achieves an average execution accuracy of over 90%—rivaling State-of-the-Art, cloud-based solutions—while sustaining at least 2x lower end-to-end latency than existing methods. By requiring nothing beyond the domain’s full API schema values yet delivering precise, real-time natural-language understanding, Grammar Pruning positions itself as a practical building block for future edge-AI applications that cannot rely on large models or cloud offloading.- Anthology ID:
- 2025.emnlp-main.858
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 16955–16968
- Language:
- URL:
- https://preview.aclanthology.org/name-variant-enfa-fane/2025.emnlp-main.858/
- DOI:
- 10.18653/v1/2025.emnlp-main.858
- Cite (ACL):
- Octavian Alexandru Trifan, Jason Lee Weber, Marc Titus Trifan, Alexandru Nicolau, and Alexander Veidenbaum. 2025. Grammar Pruning: Enabling Low-Latency Zero-Shot Task-Oriented Language Models for Edge AI. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 16955–16968, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- Grammar Pruning: Enabling Low-Latency Zero-Shot Task-Oriented Language Models for Edge AI (Trifan et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/name-variant-enfa-fane/2025.emnlp-main.858.pdf