Bridging the Pose-Semantic Gap: A Cascade Framework for Text-Based Person Anomaly Search
Zequn Xie, Guijin Luo, Chuxin Wang, Sihang Cai, Tao Jin, Zhou Zhao, Yixuan Tang
Abstract
Text-based person anomaly search retrieves specific behavioral events from surveillance archives using natural-language queries. Although recent pose-aware methods align geometric structures well, they face a fundamental Pose-Semantic Gap: semantically different actions can share similar skeletal geometries. While Multimodal Large Language Models (MLLMs) can reduce this ambiguity, using them for large-scale retrieval is computationally prohibitive. We propose the Structure-Semantic Decoupled Cascade (SSDC) framework, which decouples retrieval into two stages: (1) Structure-Aware Coarse Retrieval, where a lightweight model quickly filters candidates by skeletal similarity; and (2) Detective Squad Interaction, a multi-agent semantic verification module. The squad consists of a Detective for fast binary filtering, an Analyst for evidence extraction, and a Writer for semantic synthesis. Finally, we re-rank candidates by fusing the synthesized captions with structural priors. Experiments on the PAB benchmark show that SSDC achieves state-of-the-art performance by balancing efficiency and semantic reasoning.- Anthology ID:
- 2026.findings-acl.197
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4040–4049
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.197/
- DOI:
- Cite (ACL):
- Zequn Xie, Guijin Luo, Chuxin Wang, Sihang Cai, Tao Jin, Zhou Zhao, and Yixuan Tang. 2026. Bridging the Pose-Semantic Gap: A Cascade Framework for Text-Based Person Anomaly Search. In Findings of the Association for Computational Linguistics: ACL 2026, pages 4040–4049, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Bridging the Pose-Semantic Gap: A Cascade Framework for Text-Based Person Anomaly Search (Xie et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.197.pdf