Systematic Evaluation of Machine Learning and Transformer-Based Methods for Scientific Telescope Literature Classification

Huynh Trung Kiet; Dao Sy Duy Minh; Tran Chi Nguyen; Nguyen Lam Phu Quy; Pham Phu Hoa; Nguyễn Đình Hà Dương; Dinh Dien; Nguyen Hong Buu Long

Systematic Evaluation of Machine Learning and Transformer-Based Methods for Scientific Telescope Literature Classification

Huynh Trung Kiet, Dao Sy Duy Minh, Tran Chi Nguyen, Nguyen Lam Phu Quy, Pham Phu Hoa, Nguyen Dinh Ha Duong, Dinh Dien, Nguyen Hong Buu Long

Abstract

Recent space missions such as Hubble, Chandra, and JWST have produced a rapidly growing body of scientific literature. Maintaining telescope bibliographies is essential for mission assessment and research traceability, yet current curation processes rely heavily on manual annotation and do not scale. To facilitate progress in this direction, the TRACS @ WASP 2025 shared task provides a benchmark for automatic telescope bibliographic classification based on scientific publications. In this work, we conduct a comparative study of modeling strategies for this task. We first explore traditional machine learning methods such as multinomial Naive Bayes with TF–IDF and CountVectorizer representations. We then evaluate transformer-based multi-label classification using BERT-based scientific language models. Finally, we investigate a task-wise classification approach, where we decompose the problem into separate prediction tasks and train a dedicated model for each. In addition, we experiment with a limited-resource LLM-based approach, showing that even without full fine-tuning and using only a partial subset of the training data, LLMs exhibit promising potential for telescope classification. Our best system achieves a macro F1 of 0.72 with BERT-based models on the test evaluation, substantially outperforming the official openai-gpt-oss-20b baseline (0.31 macro F1).

Anthology ID:: 2025.wasp-main.16
Volume:: Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications
Month:: December
Year:: 2025
Address:: Mumbai, India and virtual
Editors:: Alberto Accomazzi, Tirthankar Ghosal, Felix Grezes, Kelly Lockhart
Venues:: WASP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 136–145
Language:
URL:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.wasp-main.16/
DOI:
Bibkey:
Cite (ACL):: Huynh Trung Kiet, Dao Sy Duy Minh, Tran Chi Nguyen, Nguyen Lam Phu Quy, Pham Phu Hoa, Nguyen Dinh Ha Duong, Dinh Dien, and Nguyen Hong Buu Long. 2025. Systematic Evaluation of Machine Learning and Transformer-Based Methods for Scientific Telescope Literature Classification. In Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications, pages 136–145, Mumbai, India and virtual. Association for Computational Linguistics.
Cite (Informal):: Systematic Evaluation of Machine Learning and Transformer-Based Methods for Scientific Telescope Literature Classification (Kiet et al., WASP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.wasp-main.16.pdf

PDF Cite Search Fix data