Yufei Liu


2026

Instruction Following (IF) is a core capability of LLMs, requiring strict adherence to diverse constraints, ranging from verifiable ones (e.g., output length) to unverifiable ones (e.g., tone). Reinforcement learning with verifiable rewards has emerged as a paradigm for IF tasks, leveraging LLM-as-a-judge to assess unverifiable constraints. However, we empirically find that this approach remains a significant bottleneck, suffering from severe reward hacking and higher computational overhead. In this work, we first analyze the generalization capabilities of unverifiable constraints and discover that specific constraints exhibit distinct, high-generalization patterns. Motivated by this, we propose TinyJudge, a framework that employs an ensemble of specialized tiny language models (e.g., 0.6B) to provide rewards for soft constraints. By distilling expertise from frontier models into these tiny models, it achieves high-precision, lightweight evaluation. Extensive evaluations across five benchmarks demonstrate that TinyJudge outperforms the baselines by ~10% in average performance and 12% in reward precision. Crucially, it also achieves a 3× speedup in total training time. Our work provides a scalable and robust path for aligning LLMs with unverifiable human instructions.

2025

Autoregressive Transformer (AT) dominates sequence-to-sequence generation tasks but suffers from high inference latency due to sequential token generation. Non-Autoregressive Transformer (NAT) improves inference efficiency by parallelizing token prediction, yet degrades generation quality. To address these limitations, we propose Tree-structured Non-Autoregressive Decoding (TNAD), a novel paradigm that bridges autoregressive and non-autoregressive decoding. TNAD generates a sentence through a top-down, layer-wise expansion of its constituency parse tree, enabling parallel generation within each layer while preserving contextual dependencies across layers. Experimental results on machine translation and paraphrase generation demonstrate that TNAD outperforms AT in efficiency and NAT in generation quality, thus offering a new alternative to AT and NAT in the trade-off between efficiency and quality. Our code is publicly available at https://github.com/jipy0222/TNAD.