Yifei Teng

2025

pdf bib abs
SLOT: Structuring the Output of Large Language Models
Zhengyuan Shen | Darren Yow-Bang Wang | Soumya Smruti Mishra | Zhichao Xu | Yifei Teng | Haibo Ding
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Structured outputs are essential for large language models (LLMs) in critical applications like agents and information extraction. Despite their capabilities, LLMs often generate outputs that deviate from predefined schemas, significantly hampering reliable application development. We present SLOT (Structured LLM Output Transformer), a model-agnostic approach that transforms unstructured LLM outputs into precise structured formats. While existing solutions predominantly rely on constrained decoding techniques or are tightly coupled with specific models, SLOT employs a fine-tuned lightweight language model as a post-processing layer, achieving flexibility across various LLMs and schema specifications. We introduce SLOTBench, curated by a data synthesis pipeline alongside a formal evaluation methodology that quantifies both schema accuracy and content fidelity. Our results demonstrate that fine-tuned Mistral-7B model with constrained decoding achieves near-perfect schema accuracy (99.5%) and content similarity (94.0%), outperforming Claude-3.5-Sonnet by substantial margins (+25 and +20 percentage points, respectively). Notably, even compact models like Llama-3.2-1B can match or exceed the structured output capabilities of much larger proprietary models when equipped with SLOT, enabling reliable structured generation in resource-constrained environments. SLOTBench will be released upon legal approval.

Routing incoming queries to the most cost-effective LLM while maintaining response quality poses a fundamental challenge in optimizing performance-cost trade-offs for large-scale commercial systems.We present IPR—a quality-constrained Intelligent Prompt Routing framework that dynamically selects optimal models based on predicted response quality and user-specified tolerance levels.IPR introduces three key innovations: (1) a modular architecture with lightweight quality estimators trained on 1.5M prompts annotated with calibrated quality scores, enabling fine-grained quality prediction across model families; (2) a user-controlled routing mechanism with tolerance parameter 𝜏 ∈ [0,1] that provides explicit control over quality-cost trade-offs; and (3) an extensible design using frozen encoders with model-specific adapters, reducing new model integration from days to hours. To rigorously train and evaluate IPR, we curate an industrial-level IPR dataset, a comprehensive benchmark containing 1.5 million examples with response quality annotations across 11 LLM candidates.Deployed on a major cloud platform, IPR achieves 43.9% cost reduction while maintaining quality parity with the strongest model in the Claude family and processes requests with sub-150ms latency.

2022

In E-commerce search, spelling correction plays an important role to find desired products for customers in processing user-typed search queries. However, resolving phonetic errors is a critical but much overlooked area. The query with phonetic spelling errors tends to appear correct based on pronunciation but is nonetheless inaccurate in spelling (e.g., “bluetooth sound system” vs. “blutut sant sistam”) with numerous noisy forms and sparse occurrences. In this work, we propose a generalized spelling correction system integrating phonetics to address phonetic errors in E-commerce search without additional latency cost. Using India (IN) E-commerce market for illustration, the experiment shows that our proposed phonetic solution significantly improves the F1 score by 9%+ and recall of phonetic errors by 8%+. This phonetic spelling correction system has been deployed to production, currently serving hundreds of millions of customers.