DUTIR831 at SemEval-2025 Task 5: A Multi-Stage LLM Approach to GND Subject Assignment for TIBKAT Records
Yicen Tian, Erchen Yu, Yanan Wang, Dailin Li, Jiaqi Yao, Hongfei Lin, Linlin Zong, Bo Xu
Abstract
This paper introduces DUTIR831’s approach to SemEval-2025 Task 5, which focuses on generating relevant subjects from the Integrated Authority File (GND) for tagging multilingual technical records in the TIBKAT database. To address challenges in understanding the hierarchical GND taxonomy and automating subject assignment, a three-stage approach is proposed: (1) a data synthesis stage that utilizes LLM to generate and selectively filter high-quality data, (2) a model training module that leverages LLMs and various training strategies to acquire GND knowledge and refine TIBKAT preferences, and (3) a subject terms completion mechanism consisting of multi-sampling ranking, subject terms extraction using a LLM, vector-based model retrieval, and various re-ranking strategies.The quantitative evaluation results show that our system is ranked 2nd in the all-subject datasets and 4th in the tib-core-subjects datasets. And the qualitative evaluation results show that the system is ranked 2nd in the tib-core-subjects datasets.- Anthology ID:
- 2025.semeval-1.52
- Volume:
- Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Sara Rosenthal, Aiala Rosá, Debanjan Ghosh, Marcos Zampieri
- Venues:
- SemEval | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 363–372
- Language:
- URL:
- https://preview.aclanthology.org/corrections-2025-08/2025.semeval-1.52/
- DOI:
- Cite (ACL):
- Yicen Tian, Erchen Yu, Yanan Wang, Dailin Li, Jiaqi Yao, Hongfei Lin, Linlin Zong, and Bo Xu. 2025. DUTIR831 at SemEval-2025 Task 5: A Multi-Stage LLM Approach to GND Subject Assignment for TIBKAT Records. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 363–372, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- DUTIR831 at SemEval-2025 Task 5: A Multi-Stage LLM Approach to GND Subject Assignment for TIBKAT Records (Tian et al., SemEval 2025)
- PDF:
- https://preview.aclanthology.org/corrections-2025-08/2025.semeval-1.52.pdf