Shujun Cao
2026
Looking Beyond the One: Operationalizing and Eliciting Visual Ambiguity in VLLMs
Yuchong Chen | Bowei Zou | Yuhan Chen | Yifan Fan | Xinyu Li | Shujun Cao | Yu Hong
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yuchong Chen | Bowei Zou | Yuhan Chen | Yifan Fan | Xinyu Li | Shujun Cao | Yu Hong
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Visual questions are often ambiguous: the same image–question pair may admit multiple valid answers depending on which region is referenced. However, current Visual Question Answering (VQA) systems typically collapse this ambiguity, committing to a single interpretation during decoding and evaluation. In this work, we study visual question ambiguity from a grounded, region-centric perspective. We operationalize ambiguity as the existence of multiple distinct answer-supporting regions in an image, each independently yielding a valid answer. This formulation makes ambiguity observable without requiring exhaustive multi-answer annotations. Based on this definition, we conduct a systematic empirical study of state-of-the-art Visual Large Language Models (VLLMs). We find that, under default decoding, VLLMs consistently under-report ambiguity—even when multiple valid visual groundings are present. Importantly, probing model hidden states reveals that ambiguity-related signals are already encoded in their internal representations, despite not being reliably expressed in outputs. Finally, we show that selectively activating multi-focus answering based on these signals can recover additional valid answers while avoiding excessive hallucination. Together, our results suggest that ambiguity in VQA is not merely an annotation artifact or capability limitation, but a property that VLLMs internally recognize yet often fail to surface under standard decoding assumptions.
2025
Enhancing Attributed Question Answering using Tailored Progressive Curriculum Learning
Yuhan Chen | Bowei Zou | Yifan Fan | Yuchong Chen | Shujun Cao | Yu Hong
Findings of the Association for Computational Linguistics: EMNLP 2025
Yuhan Chen | Bowei Zou | Yifan Fan | Yuchong Chen | Shujun Cao | Yu Hong
Findings of the Association for Computational Linguistics: EMNLP 2025
We study Attributed Question Answering (abbr., AQA), a newly-released long-form answer generation task. The tailored and efficient training programmes haven’t yet been leveraged to strengthen AQA models. This hinders the simultaneous enhancement of their essential capabilities, including evidence identification, cross-source relation recognition and anti-distraction reasoning. To address the issue, we propose a tailored progressive curriculum learning approach, and use it to optimize both encoder-decoder and decoder-only AQA models. Experiments on the benchmark QuoteSum show that our approach yields substantial improvements and enables the AQA performance to reach 73.9% Sem-F1 score.