Tan Xi
2025
RUC Team at SemEval-2025 Task 5: Fast Automated Subject Indexing: A Method Based on Similar Records Matching and Related Subject Ranking
Xia Tian
|
Yang Xin
|
Wu Jing
|
Xiu Heng
|
Zhang Xin
|
Li Yu
|
Gao Tong
|
Tan Xi
|
Hu Dong
|
Chen Tao
|
Jia Zhi
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
This paper presents MaRSI, an automatic subject indexing method designed to address the limitations of traditional manual indexing and emerging GenAI technologies. Focusing on improving indexing accuracy in cross-lingual contexts and balancing efficiency and accuracy in large-scale datasets, MaRSI mimics human reference learning behavior by constructing semantic indexes from pre-indexed document. It calculates similarity to retrieve relevant references, merges, and reorders their topics to generate index results. Experiments demonstrate that MaRSI outperforms supervised fine-tuning of LLMs on the same dataset, offering advantages in speed, effectiveness, and interpretability.