Dawei Xiang


2026

Large Language Models (LLMs) have fundamentally transformed natural language processing (NLP), demonstrating remarkable capabilities across a wide spectrum of tasks. However, when applied to instruction-based text editing, LLMs continue to exhibit some limitations. Different from free-form generation, instruction-based editing requires precise, targeted modifications that respect two essential properties: faithfully implementing the specific instruction and local fidelity. Existing approaches often overlook these properties, treating editing as a generic text generation problem. As a result, they either over-edit or fail to apply modifications consistently. To address this gap, we propose HyperEdit, a framework that adaptively processes each editing request to best align with it. To achieve this, HyperEdit generates request-specific dynamic weights that guide the editing process. The computational overhead of producing these weights is minimized through a carefully designed hypernetwork. With this design, HyperEdit achieves a relatively 9% improvement over the state-of-the-art editing model.
As large language models (LLMs) improve, many applications are moving from a single LLM call to multi-agent systems. These systems often rely on either hand-designed or automatically optimized workflows with multiple verification and testing steps. While those extra steps can improve accuracy, they also increase latency and token costs. In practice, many queries do not need such heavy processing and can be handled well by a single strong agent.To address this inefficiency, we propose LLM-as-Scheduler (LAS), a system that dynamically chooses the right workflow for each query. LAS uses a two-stage cascade: first, a lightweight gate quickly evaluates each agent’s output; then, an LLM-based scheduler uses query features and gate signals to make more detailed routing decisions. Experiments show that LAS cuts token usage by 43% and reduces end-to-end latency by more than 36%, while causing at most a 1.4 percentage-point drop in accuracy compared with a strong fixed workflow.

2025

The rapid advancement of generative AI has democratized access to powerful tools such as Text-to-Image (T2I) models. However, to generate high-quality images, users must still craft detailed prompts specifying scene, style, and context—often through multiple rounds of refinement. We propose PromptSculptor, a novel multi-agent framework that automates this iterative prompt optimization process. Our system decomposes the task into four specialized agents that work collaboratively to transform a short, vague user prompt into a comprehensive, refined prompt. By leveraging Chain-of-Thought (CoT) reasoning, our framework effectively infers hidden context and enriches scene and background details. To iteratively refine the prompt, a self-evaluation agent aligns the modified prompt with the original input, while a feedback-tuning agent incorporates user feedback for further refinement. Experimental results demonstrate that PromptSculptor significantly enhances output quality and reduces the number of iterations needed for user satisfaction. Moreover, its model-agnostic design allows seamless integration with various T2I models, paving the way for industrial applications.