Yang Zhou
Other people with similar names: Yang Zhou
Unverified author pages with similar names: Yang Zhou
2026
RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation
Sunzhu Li | Jiale Zhao | Huimin Ren | Zhenlin Wei | Yang Zhou | Jingwen Yang | Shunyu Liu | Kaike Zhang | Chen Wei
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Sunzhu Li | Jiale Zhao | Huimin Ren | Zhenlin Wei | Yang Zhou | Jingwen Yang | Shunyu Liu | Kaike Zhang | Chen Wei
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reinforcement Learning with Verifiable Rewards (RLVR) has driven substantial progress in reasoning-intensive domains like mathematics. However, optimizing open-ended generation remains challenging due to the lack of ground truth. While rubric-based evaluation offers a structured proxy for verification, existing methods suffer from scalability bottlenecks and coarse criteria, resulting in a supervision ceiling effect. To address this, we propose an automated Coarse-to-Fine Rubric Generation framework. By synergizing principle-guided synthesis, multi-model aggregation, and difficulty evolution, our approach produces comprehensive and highly discriminative criteria capable of capturing the subtle nuances. Based on this framework, we introduce RubricHub, a large-scale (110k) and multi-domain dataset. We validate its utility through a two-stage post-training pipeline comprising Rubric-based Rejection Sampling Fine-Tuning (RuFT) and Reinforcement Learning (RuRL). Experimental results demonstrate that RubricHub unlocks significant performance gains: our post-trained Qwen3-14B achieves state-of-the-art (SOTA) results on HealthBench (69.3), surpassing proprietary frontier models such as GPT-5.
2025
FPE2M2: Approaching Lossless and Efficient Quantization with Native Floating Point
Ke Yi | Jianwei Zhang | Zhiying Xu | Xinlong Yang | Yang Zhou | Minmin Sun | Zengke Liu | Tong Zhang | Junyang Lin | Jingren Zhou
Findings of the Association for Computational Linguistics: ACL 2025
Ke Yi | Jianwei Zhang | Zhiying Xu | Xinlong Yang | Yang Zhou | Minmin Sun | Zengke Liu | Tong Zhang | Junyang Lin | Jingren Zhou
Findings of the Association for Computational Linguistics: ACL 2025
Auto-regressive decoding is a memory-bound job, meaning decoding inference performance is limited by the bandwidth rather than the computational capabilities of the GPU. Weight-only quantization is a promising method to address the memory-bound limitations. Previous studies have followed one of two approaches. Some have exclusively studied integer quantization while ignoring the Gaussian distribution nature of LLMs’ weights. Others have proposed non-uniform quantization but incurred additional I/O overhead due to lookup tables, e.g. NF4. In this work, we extend the IEEE 754 float-point standard to the ExMy quantization schema, which allocates x bit for the exponent and y bit for the mantissa to represent a number. In terms of runtime efficiency, we demonstrate that the conversion from ExMy to FP16 can be realized through register-level operations, which can get almost the same performance as INT5. In terms of quantization loss, we analyze that of different ExMy settings, where the E2M2 schema achieves an optimal balance, offering the highest efficiency with lossless accuracy. We further propose the FPE2M2 framework that supports lossless weight-only quantization inference and validate the FPE2M2 framework on Qwen and LLaMA Models across various modalities, such as text, image, and audio tasks, which achieves a faster inference speed while maintaining nearly lossless accuracy.