TC-Bench: Benchmarking Temporal Compositionality in Conditional Video Generation

Weixi Feng; Jiachen Li; Michael Saxon; Tsu-Jui Fu; Wenhu Chen; William Yang Wang

TC-Bench: Benchmarking Temporal Compositionality in Conditional Video Generation

Weixi Feng, Jiachen Li, Michael Saxon, Tsu-Jui Fu, Wenhu Chen, William Yang Wang

Abstract

Video generation has many unique challenges beyond those of image generation. The temporal dimension introduces extensive possible variations across frames, over which consistency and continuity may be violated. In this work, we evaluate the emergence of new concepts and relation transitions as time progresses in a video, which we refer to as Temporal Compositionality. We propose TC-Bench, a benchmark of meticulously crafted text prompts, ground truth videos, and new evaluation metrics. The prompts articulate the initial and final states of scenes, effectively reducing ambiguities for frame development. In addition, by collecting corresponding ground-truth videos, the benchmark can be used for text-to-video and image-to-video generation. We develop new metrics to measure the completeness of component transitions, which demonstrate significantly higher correlations with human judgments than existing metrics. Our experiments reveal that contemporary video generators are still weak in prompt understanding and achieve less than 20% of the compositional changes, highlighting enormous improvement space. Our analysis indicates that current video generation models struggle to interpret descriptions of compositional changes and synthesize various components across different time steps.

Anthology ID:: 2025.findings-acl.241
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4638–4662
Language:
URL:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.241/
DOI:
Bibkey:
Cite (ACL):: Weixi Feng, Jiachen Li, Michael Saxon, Tsu-Jui Fu, Wenhu Chen, and William Yang Wang. 2025. TC-Bench: Benchmarking Temporal Compositionality in Conditional Video Generation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 4638–4662, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: TC-Bench: Benchmarking Temporal Compositionality in Conditional Video Generation (Feng et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.241.pdf

PDF Cite Search Fix data