Evaluating Perspectival Biases in Cross-Modal Retrieval

Teerapol Saengsukhiran, Peerawat Chomphooyod, Narabodee Rodjananant, Chompakorn Chaksangchaichot, Patawee Prakrankamanant, Witthawin Sripheanpol, Pak Lovichit, Sarana Nutanong, Ekapol Chuangsuwanich


Abstract
Multimodal retrieval systems are expected to operate in a semantic space, agnostic to the language or cultural origin of the query. In practice, however, retrieval outcomes systematically reflect perspectival biases: deviations shaped by linguistic **prevalence** and **cultural** associations. We introduce the **Cross-Cultural, Cross-Modal, Cross-lingual Multimodal (3XCM)** benchmark to isolate these effects. Results from our studies indicate that, for image-to-text retrieval, models tend to favor entries from prevalent languages over those that are semantically faithful. For text-to-image retrieval, we observe a consistent "tugging effect” in the joint embedding space between semantic alignment and language-conditioned cultural association. When semantic representations are insufficiently resolved, particularly in low-resource languages, similarity is increasingly governed by culturally familiar visual patterns, leading to systematic association bias in retrieval. Our findings suggest that achieving equitable multimodal retrieval necessitates targeted strategies that explicitly decouple language from culture, rather than relying solely on broader data exposure. This work highlights the need to treat linguistic and cultural biases as distinct, measurable challenges in multimodal representation learning.
Anthology ID:
2026.findings-acl.1795
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
36018–36049
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1795/
DOI:
Bibkey:
Cite (ACL):
Teerapol Saengsukhiran, Peerawat Chomphooyod, Narabodee Rodjananant, Chompakorn Chaksangchaichot, Patawee Prakrankamanant, Witthawin Sripheanpol, Pak Lovichit, Sarana Nutanong, and Ekapol Chuangsuwanich. 2026. Evaluating Perspectival Biases in Cross-Modal Retrieval. In Findings of the Association for Computational Linguistics: ACL 2026, pages 36018–36049, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Evaluating Perspectival Biases in Cross-Modal Retrieval (Saengsukhiran et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1795.pdf
Checklist:
 2026.findings-acl.1795.checklist.pdf