Iterative Self-Correction for Text-Driven Person Re-Identification with Large Vision-Language Models
Guijin Luo, Zequn Xie, Sihang Cai, Chuxin Wang, Zhou Zhao, Yixuan Tang
Abstract
Person Re-Identification (ReID) has long struggled with the semantic gap between low-level visual features and high-level identity concepts. While Vision-Language Models (VLMs) offer promising semantic understanding, existing methods typically adopt a static "one-pass" paradigm, converting images to text once for retrieval. This approach suffers from two critical flaws: Information Bottleneck, where converting rich visuals into text causes detail loss, and Open-Loop Failure, where initial hallucinations propagate without recourse. To address this, we propose Auto-ReID, a novel framework that reformulates ReID as an iterative "Think-and-Refine" process. We first introduce a Hierarchical Progressive Tuning strategy to transform a generic VLM into a specialized Re-ID expert. During inference, we deploy a closed-loop architecture comprising a Reasoner for structured attribute extraction, a Hybrid Retriever that anchors dynamic semantic queries with stable visual features to prevent drift, and a Corrector that deconstructs and verifies candidates to iteratively optimize the search. Extensive experiments on ReID datasets demonstrate that our method significantly outperforms state-of-the-art approaches, particularly in complex occlusion scenarios.- Anthology ID:
- 2026.findings-acl.312
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6292–6301
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.312/
- DOI:
- Cite (ACL):
- Guijin Luo, Zequn Xie, Sihang Cai, Chuxin Wang, Zhou Zhao, and Yixuan Tang. 2026. Iterative Self-Correction for Text-Driven Person Re-Identification with Large Vision-Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 6292–6301, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Iterative Self-Correction for Text-Driven Person Re-Identification with Large Vision-Language Models (Luo et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.312.pdf