Hefei Ling
2026
Beyond Query Bias: Candidate-Aware Iterative Refinement for Zero-Shot Composed Image Retrieval
Nan Sun | Jing Tang | Lei Sun | Rui Chen | Yuxing Lu | Xiangxiang Chu | Hefei Ling | Yujun Cai
Findings of the Association for Computational Linguistics: ACL 2026
Nan Sun | Jing Tang | Lei Sun | Rui Chen | Yuxing Lu | Xiangxiang Chu | Hefei Ling | Yujun Cai
Findings of the Association for Computational Linguistics: ACL 2026
Zero-Shot Composed Image Retrieval (ZS-CIR) retrieves target images using a reference image and modification text without task-specific training. Existing methods typically rely on MLLMs to generate query vectors with pre-trained models like CLIP. However, those constructed queries suffer from inherent cognitive bias due to unknown candidate distribution. We propose CoRR, a training-free framework that reframes ZS-CIR as a self-correcting process through bias-aware query refinement. CoRR uses retrieved results as feedback to perceive the candidate distribution. With carefully designed CoT prompting, the MLLM inspects the retrieved candidates to identify intent misalignments in the query and then corrects them via Historical Query Fusion. We also introduce Retrieval-Driven Caption Optimization to provide context-aligned examples, reducing phrasing and style mismatches. Experiments on public benchmarks show that CoRR significantly outperforms other SOTA methods.
EntroBench: Evaluating LLM Watermarking Under Multi-Entropy Scenarios and Practical User Operations
Pengyuan Qin | Linnan Tu | Yuhan Ke | Hefei Ling
Findings of the Association for Computational Linguistics: ACL 2026
Pengyuan Qin | Linnan Tu | Yuhan Ke | Hefei Ling
Findings of the Association for Computational Linguistics: ACL 2026
Large language models (LLMs) watermarking has been proposed as an active approach for content provenance verification, yet existing evaluations are largely confined to fixed entropy settings. In this paper, we introduce EntroBench, a benchmark for LLM watermarking that systematically covers three entropy levels and seven representative tasks. We conducted a fair evaluation of eight watermarking methods through hyper-parameter search based on an anchored dataset. We find that current approaches struggle to perform consistently across different entropy levels. Our analysis reveals a clear trade-off between watermark detectability and downstream output quality that varies across tasks and entropy conditions. Furthermore, we assess watermark robustness under realistic user interaction scenarios and show that common, non-adversarial user behaviors can substantially degrade watermark signals. These results indicate that practical usage-driven perturbations pose a significant challenge to current watermarking techniques. EntroBench provides a unified evaluation framework for studying these issues and supports the development of more adaptive and robust LLM watermarking methods. Dataset and codes are available at https://github.com/py-qin/EntroBench.