TartuNLP at SemEval-2025 Task 5: Subject Tagging as Two-Stage Information Retrieval

Aleksei Dorkin, Kairit Sirts


Abstract
We present our submission to the Task 5 of SemEval-2025. We frame the task as an information retrieval problem, where the document content is used to retrieve subject tags from a large subject taxonomy. We leverage two types of encoder models to build a two-stage information retrieval system—a bi-encoder for coarse-grained candidate extraction at the first stage, and a cross-encoder for fine-grained re-ranking at the second stage.
Anthology ID:
2025.semeval-1.319
Volume:
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Sara Rosenthal, Aiala Rosá, Debanjan Ghosh, Marcos Zampieri
Venues:
SemEval | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2449–2454
Language:
URL:
https://preview.aclanthology.org/corrections-2025-08/2025.semeval-1.319/
DOI:
Bibkey:
Cite (ACL):
Aleksei Dorkin and Kairit Sirts. 2025. TartuNLP at SemEval-2025 Task 5: Subject Tagging as Two-Stage Information Retrieval. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 2449–2454, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
TartuNLP at SemEval-2025 Task 5: Subject Tagging as Two-Stage Information Retrieval (Dorkin & Sirts, SemEval 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-08/2025.semeval-1.319.pdf