Iterative Self-Correction for Text-Driven Person Re-Identification with Large Vision-Language Models

Guijin Luo, Zequn Xie, Sihang Cai, Chuxin Wang, Zhou Zhao, Yixuan Tang


Abstract
Person Re-Identification (ReID) has long struggled with the semantic gap between low-level visual features and high-level identity concepts. While Vision-Language Models (VLMs) offer promising semantic understanding, existing methods typically adopt a static "one-pass" paradigm, converting images to text once for retrieval. This approach suffers from two critical flaws: Information Bottleneck, where converting rich visuals into text causes detail loss, and Open-Loop Failure, where initial hallucinations propagate without recourse. To address this, we propose Auto-ReID, a novel framework that reformulates ReID as an iterative "Think-and-Refine" process. We first introduce a Hierarchical Progressive Tuning strategy to transform a generic VLM into a specialized Re-ID expert. During inference, we deploy a closed-loop architecture comprising a Reasoner for structured attribute extraction, a Hybrid Retriever that anchors dynamic semantic queries with stable visual features to prevent drift, and a Corrector that deconstructs and verifies candidates to iteratively optimize the search. Extensive experiments on ReID datasets demonstrate that our method significantly outperforms state-of-the-art approaches, particularly in complex occlusion scenarios.
Anthology ID:
2026.findings-acl.312
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6292–6301
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.312/
DOI:
Bibkey:
Cite (ACL):
Guijin Luo, Zequn Xie, Sihang Cai, Chuxin Wang, Zhou Zhao, and Yixuan Tang. 2026. Iterative Self-Correction for Text-Driven Person Re-Identification with Large Vision-Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 6292–6301, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Iterative Self-Correction for Text-Driven Person Re-Identification with Large Vision-Language Models (Luo et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.312.pdf
Checklist:
 2026.findings-acl.312.checklist.pdf