VisTW: Benchmarking Vision-Language Models for Taiwanese Mandarin in Taiwan
Zhi Rui Tam, Yung-Yu Shih, Yen-Wei Lee, Ya-Ting Pai, Wen Yu Chang, Yun-Nung Chen
Abstract
Vision-Language Models (VLMs) often struggle in Taiwanese Mandarin environments due to region-specific orthographic and cultural context. We introduce VisTW, a comprehensive benchmark featuring (i) multiple-choice questions (3,795 academic questions) and (ii) free-form generation evaluation (141 Taiwanese-context free-form pairs). Beyond standard accuracy, we investigate character mixing— the unintended production of Simplified Chinese characters under Taiwanese-Mandarin-style prompts—and propose a human-grounded purity penalty derived from perceptual thresholds measured from users. Our evaluation reveals substantial character contamination (3%–19%) across state-of-the-art VLMs. We find that Gemini-3-Pro significantly outperforms the strongest open-weight baseline, Qwen3 235B MoE, by up to 22 percentage points on dialogue tasks once the purity penalty is applied. These results highlight orthographic consistency as a vital, yet overlooked, dimension for localized multimodal evaluation and deployment.- Anthology ID:
- 2026.findings-acl.1830
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 36711–36756
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1830/
- DOI:
- Cite (ACL):
- Zhi Rui Tam, Yung-Yu Shih, Yen-Wei Lee, Ya-Ting Pai, Wen Yu Chang, and Yun-Nung Chen. 2026. VisTW: Benchmarking Vision-Language Models for Taiwanese Mandarin in Taiwan. In Findings of the Association for Computational Linguistics: ACL 2026, pages 36711–36756, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- VisTW: Benchmarking Vision-Language Models for Taiwanese Mandarin in Taiwan (Tam et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1830.pdf