Evaluating Perspectival Biases in Cross-Modal Retrieval
Teerapol Saengsukhiran, Peerawat Chomphooyod, Narabodee Rodjananant, Chompakorn Chaksangchaichot, Patawee Prakrankamanant, Witthawin Sripheanpol, Pak Lovichit, Sarana Nutanong, Ekapol Chuangsuwanich
Abstract
Multimodal retrieval systems are expected to operate in a semantic space, agnostic to the language or cultural origin of the query. In practice, however, retrieval outcomes systematically reflect perspectival biases: deviations shaped by linguistic **prevalence** and **cultural** associations. We introduce the **Cross-Cultural, Cross-Modal, Cross-lingual Multimodal (3XCM)** benchmark to isolate these effects. Results from our studies indicate that, for image-to-text retrieval, models tend to favor entries from prevalent languages over those that are semantically faithful. For text-to-image retrieval, we observe a consistent "tugging effect” in the joint embedding space between semantic alignment and language-conditioned cultural association. When semantic representations are insufficiently resolved, particularly in low-resource languages, similarity is increasingly governed by culturally familiar visual patterns, leading to systematic association bias in retrieval. Our findings suggest that achieving equitable multimodal retrieval necessitates targeted strategies that explicitly decouple language from culture, rather than relying solely on broader data exposure. This work highlights the need to treat linguistic and cultural biases as distinct, measurable challenges in multimodal representation learning.- Anthology ID:
- 2026.findings-acl.1795
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 36018–36049
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1795/
- DOI:
- Cite (ACL):
- Teerapol Saengsukhiran, Peerawat Chomphooyod, Narabodee Rodjananant, Chompakorn Chaksangchaichot, Patawee Prakrankamanant, Witthawin Sripheanpol, Pak Lovichit, Sarana Nutanong, and Ekapol Chuangsuwanich. 2026. Evaluating Perspectival Biases in Cross-Modal Retrieval. In Findings of the Association for Computational Linguistics: ACL 2026, pages 36018–36049, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Evaluating Perspectival Biases in Cross-Modal Retrieval (Saengsukhiran et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1795.pdf