UMUTeam at SemEval-2025 Task 7: Multilingual Fact-Checked Claim Retrieval with XLM-RoBERTa and Self-Alignment Pretraining Strategy

Ronghao Pan, Tomás Bernal - Beltrán, José Antonio García - Díaz, Rafael Valencia - García


Abstract
In today’s digital age, the rapid dissemination of information through social networks poses significant challenges in verifying the veracity of shared content. The proliferation of misinformation can have serious consequences, influencing public opinion, policy decisions, and social dynamics. Fact-checking plays a critical role in countering misinformation; however, the manual verification process is time-consuming, especially when dealing with multilingual content. This paper presents our participation in the Multilingual and Crosslingual Fact-Checked Claim Retrieval task (SemEval 2025), which seeks to identify previously fact-checked claims relevant to social media posts. Our proposed system leverages XLM-RoBERTa, a multilingual Transformer model, combined with metric learning and hard negative mining strategies, to optimize the semantic comparison of posts and fact-checks across multiple languages. By fine-tuning a shared embedding space and employing a multiple similarity loss function, our approach enhances retrieval accuracy while maintaining efficiency. Evaluation results demonstrate competitive performance across multiple languages, reaching 25th place and highlighting the potential of multilingual NLP models in automating the fact-checking process and mitigating misinformation spread.
Anthology ID:
2025.semeval-1.103
Volume:
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Sara Rosenthal, Aiala Rosá, Debanjan Ghosh, Marcos Zampieri
Venues:
SemEval | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
757–762
Language:
URL:
https://preview.aclanthology.org/corrections-2025-08/2025.semeval-1.103/
DOI:
Bibkey:
Cite (ACL):
Ronghao Pan, Tomás Bernal - Beltrán, José Antonio García - Díaz, and Rafael Valencia - García. 2025. UMUTeam at SemEval-2025 Task 7: Multilingual Fact-Checked Claim Retrieval with XLM-RoBERTa and Self-Alignment Pretraining Strategy. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 757–762, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
UMUTeam at SemEval-2025 Task 7: Multilingual Fact-Checked Claim Retrieval with XLM-RoBERTa and Self-Alignment Pretraining Strategy (Pan et al., SemEval 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-08/2025.semeval-1.103.pdf