Nguyen Hong Buu Long


2025

pdf bib
Systematic Evaluation of Machine Learning and Transformer-Based Methods for Scientific Telescope Literature Classification
Huynh Trung Kiet | Dao Sy Duy Minh | Tran Chi Nguyen | Nguyen Lam Phu Quy | Pham Phu Hoa | Nguyen Dinh Ha Duong | Dinh Dien | Nguyen Hong Buu Long
Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications

Recent space missions such as Hubble, Chandra, and JWST have produced a rapidly growing body of scientific literature. Maintaining telescope bibliographies is essential for mission assessment and research traceability, yet current curation processes rely heavily on manual annotation and do not scale. To facilitate progress in this direction, the TRACS @ WASP 2025 shared task provides a benchmark for automatic telescope bibliographic classification based on scientific publications. In this work, we conduct a comparative study of modeling strategies for this task. We first explore traditional machine learning methods such as multinomial Naive Bayes with TF–IDF and CountVectorizer representations. We then evaluate transformer-based multi-label classification using BERT-based scientific language models. Finally, we investigate a task-wise classification approach, where we decompose the problem into separate prediction tasks and train a dedicated model for each. In addition, we experiment with a limited-resource LLM-based approach, showing that even without full fine-tuning and using only a partial subset of the training data, LLMs exhibit promising potential for telescope classification. Our best system achieves a macro F1 of 0.72 with BERT-based models on the test evaluation, substantially outperforming the official openai-gpt-oss-20b baseline (0.31 macro F1).