AutoML Meets Hugging Face: Domain-Aware Pretrained Model Selection for Text Classification

Parisa Safikhani, David Broneske


Abstract
The effectiveness of embedding methods is crucial for optimizing text classification performance in Automated Machine Learning (AutoML). However, selecting the most suitable pre-trained model for a given task remains challenging. This study introduces the Corpus-Driven Domain Mapping (CDDM) pipeline, which utilizes a domain-annotated corpus of pre-fine-tuned models from the Hugging Face Model Hub to improve model selection. Integrating these models into AutoML systems significantly boosts classification performance across multiple datasets compared to baseline methods. Despite some domain recognition inaccuracies, results demonstrate CDDM’s potential to enhance model selection, streamline AutoML workflows, and reduce computational costs.
Anthology ID:
2025.naacl-srw.45
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
Month:
April
Year:
2025
Address:
Albuquerque, USA
Editors:
Abteen Ebrahimi, Samar Haider, Emmy Liu, Sammar Haider, Maria Leonor Pacheco, Shira Wein
Venues:
NAACL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
466–473
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-srw.45/
DOI:
Bibkey:
Cite (ACL):
Parisa Safikhani and David Broneske. 2025. AutoML Meets Hugging Face: Domain-Aware Pretrained Model Selection for Text Classification. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 466–473, Albuquerque, USA. Association for Computational Linguistics.
Cite (Informal):
AutoML Meets Hugging Face: Domain-Aware Pretrained Model Selection for Text Classification (Safikhani & Broneske, NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-srw.45.pdf