UWBa at SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval

Ladislav Lenc, Daniel Cífka, Jiri Martinek, Jakub Šmíd, Pavel Kral


Abstract
This paper presents a zero-shot system for fact-checked claim retrieval. We employed several state-of-the-art large language models to obtain text embeddings. The models were then combined to obtain the best possible result. Our approach achieved 7th place in monolingual and 9th in cross-lingual subtasks. We used only English translations as an input to the text embedding models since multilingual models did not achieve satisfactory results. We identified the most relevant claims for each post by leveraging the embeddings and measuring cosine similarity. Overall, the best results were obtained by the NVIDIA NV-Embed-v2 model. For some languages, we benefited from model combinations (NV-Embed & GPT or Mistral).
Anthology ID:
2025.semeval-1.31
Volume:
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Sara Rosenthal, Aiala Rosá, Debanjan Ghosh, Marcos Zampieri
Venues:
SemEval | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
209–215
Language:
URL:
https://preview.aclanthology.org/corrections-2025-08/2025.semeval-1.31/
DOI:
Bibkey:
Cite (ACL):
Ladislav Lenc, Daniel Cífka, Jiri Martinek, Jakub Šmíd, and Pavel Kral. 2025. UWBa at SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 209–215, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
UWBa at SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval (Lenc et al., SemEval 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-08/2025.semeval-1.31.pdf