Shengjun Wu


2026

Measurement scales play a crucial role in quantifying the nuanced dimensions of human cognition and behavior, however, their development typically demands extensive manual labor, and current methodologies lack systematic automation and standardized evaluation. In this paper, we introduce AutoScale, a pioneering multi-agent framework that automates scale development by leveraging collaborative AI agents. Our contributions are threefold: (1) a novel multi-agent LLM-based framework for end-to-end scale generation that replicates expert collaboration and iterative data-driven refinement, (2) the first comprehensive dataset, SCALE-1.2K, comprising 1.2K validated scales across 16 psychological domains, establishing a benchmark for automated scale development, and (3) a multi-dimensional evaluation system, featuring Muti-LLM-as-judge for conceptual and linguistic assessment and simulated large-scale testing for rigorous psychometric verification. Experimental results demonstrate that AutoScale streamlines the scale development process while maintaining rigorous quality standards, significantly reducing manual effort and paving the way for more efficient and objective measurement design in diverse research fields.