Discrepancy Detection at the Data Level: Toward Consistent Multilingual Question Answering

Lorena Calvo-Bartolomé, Valérie Aldana, Karla Cantarero, Alonso Madroñal de Mesa, Jerónimo Arenas-García, Jordan Lee Boyd-Graber


Abstract
Multilingual question answering (QA) systems must ensure factual consistency across languages, especially for objective queries such as What is jaundice?, while also accounting for cultural variation in subjective responses. We propose MIND, a user-in-the-loop fact-checking pipeline to detect factual and cultural discrepancies in multilingual QA knowledge bases. MIND highlights divergent answers to culturally sensitive questions (e.g., Who assists in childbirth?) that vary by region and context. We evaluate MIND on a bilingual QA system in the maternal and infant health domain and release a dataset of bilingual questions annotated for factual and cultural inconsistencies. We further test MIND on datasets from other domains to assess generalization. In all cases, MIND reliably identifies inconsistencies, supporting the development of more culturally aware and factually consistent QA systems.
Anthology ID:
2025.emnlp-main.1120
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22024–22065
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1120/
DOI:
Bibkey:
Cite (ACL):
Lorena Calvo-Bartolomé, Valérie Aldana, Karla Cantarero, Alonso Madroñal de Mesa, Jerónimo Arenas-García, and Jordan Lee Boyd-Graber. 2025. Discrepancy Detection at the Data Level: Toward Consistent Multilingual Question Answering. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 22024–22065, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Discrepancy Detection at the Data Level: Toward Consistent Multilingual Question Answering (Calvo-Bartolomé et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1120.pdf
Checklist:
 2025.emnlp-main.1120.checklist.pdf