Automated Screening of Antibacterial Nanoparticle Literature: Dataset Curation and Model Evaluation

Alperen Ozturk, Şaziye Betül Özateş, Sophia Bahar Root, Angela Violi, Nicholas Kotov, J. Scott VanEpps, Emine Sumeyra Turali Emre


Abstract
Antimicrobial resistance is a growing global health threat, driving interest in nanoparticle-based alternatives to conventional antibiotics. Inorganic nanoparticles (NPs) with intrinsic antibacterial properties show significant promise; however, efficiently identifying relevant studies from the rapidly expanding literature remains a major challenge. This step is crucial for enabling computational approaches that aim to model and predict NP efficacy based on physicochemical and structural features. In this study, we explore the effectiveness of traditional machine learning and deep learning methods in classifying scientific abstracts in the domain of NP-based antimicrobial research. We introduce the “Antibacterial Inorganic NAnoparticles Dataset” AINA of 7,910 articles, curated to distinguish intrinsic antibacterial NPs from studies focusing on drug carriers or surface-bound applications. Our comparative evaluation shows that a fine-tuned BioBERT classifier achieved the highest macro F1 (0.82), while a lightweight SVM model with TF-IDF features remained competitive (0.78), highlighting their utility in low-resource settings. AINA enables reproducible, large-scale identification of intrinsically bactericidal inorganic NPs. By reducing noise from non-intrinsic contexts, this work provides a foundation for mechanism-aware screening, database construction, and predictive modeling in antimicrobial NP research.
Anthology ID:
2026.eacl-long.20
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
454–465
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.20/
DOI:
Bibkey:
Cite (ACL):
Alperen Ozturk, Şaziye Betül Özateş, Sophia Bahar Root, Angela Violi, Nicholas Kotov, J. Scott VanEpps, and Emine Sumeyra Turali Emre. 2026. Automated Screening of Antibacterial Nanoparticle Literature: Dataset Curation and Model Evaluation. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 454–465, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Automated Screening of Antibacterial Nanoparticle Literature: Dataset Curation and Model Evaluation (Ozturk et al., EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.20.pdf