How Much Do Large Language Models Know about Human Motion? A Case Study in 3D Avatar Control

Kunhang Li, Jason Naradowsky, Yansong Feng, Yusuke Miyao


Abstract
We explore the human motion knowledge of Large Language Models (LLMs) through 3D avatar control. Given a motion instruction, we prompt LLMs to first generate a high-level movement plan with consecutive steps (**High-level Planning**), then specify body part positions in each step (**Low-level Planning**), which we linearly interpolate into avatar animations. Using 20 representative motion instructions that cover fundamental movements and balance body part usage, we conduct comprehensive evaluations, including human and automatic scoring of both high-level movement plans and generated animations, as well as automatic comparison with oracle positions in low-level planning. Our findings show that LLMs are strong at interpreting high-level body movements but struggle with precise body part positioning. While decomposing motion queries into atomic components improves planning, LLMs face challenges in multi-step movements involving high-degree-of-freedom body parts. Furthermore, LLMs provide reasonable approximations for general spatial descriptions, but fall short in handling precise spatial specifications. Notably, LLMs demonstrate promise in conceptualizing creative motions and distinguishing culturally specific motion patterns.
Anthology ID:
2025.findings-emnlp.747
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13878–13921
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.747/
DOI:
10.18653/v1/2025.findings-emnlp.747
Bibkey:
Cite (ACL):
Kunhang Li, Jason Naradowsky, Yansong Feng, and Yusuke Miyao. 2025. How Much Do Large Language Models Know about Human Motion? A Case Study in 3D Avatar Control. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 13878–13921, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
How Much Do Large Language Models Know about Human Motion? A Case Study in 3D Avatar Control (Li et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.747.pdf
Checklist:
 2025.findings-emnlp.747.checklist.pdf