Efficient Scientific Full Text Classification: The Case of EICAT Impact Assessments

Marc Felix Brinner, Sina Zarrieß


Abstract
This study explores strategies for efficiently classifying scientific full texts using both small, BERT-based models and local large language models like Llama-3.1 8B. We focus on developing methods for selecting subsets of input sentences to reduce input size while simultaneously enhancing classification performance. To this end, we compile a novel dataset consisting of full-text scientific papers from the field of invasion biology, specifically addressing the impacts of invasive species. These papers are aligned with publicly available impact assessments created by researchers for the International Union for Conservation of Nature (IUCN). Through extensive experimentation, we demonstrate that various sources like human evidence annotations, LLM-generated annotations or explainability scores can be used to train sentence selection models that improve the performance of both encoder- and decoder-based language models while optimizing efficiency through the reduction in input length, leading to improved results even if compared to models like ModernBERT that are able to handle the complete text as input. Additionally, we find that repeated sampling of shorter inputs proves to be a very effective strategy that, at a slightly increased cost, can further improve classification performance.
Anthology ID:
2025.nlp4ecology-1.20
Volume:
Proceedings of the 1st Workshop on Ecology, Environment, and Natural Language Processing (NLP4Ecology2025)
Month:
march
Year:
2025
Address:
Tallinn, Estonia
Editors:
Valerio Basile, Cristina Bosco, Francesca Grasso, Muhammad Okky Ibrohim, Maria Skeppstedt, Manfred Stede
Venues:
NLP4Ecology | WS
SIG:
Publisher:
University of Tartu Library
Note:
Pages:
94–103
Language:
URL:
https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.nlp4ecology-1.20/
DOI:
Bibkey:
Cite (ACL):
Marc Felix Brinner and Sina Zarrieß. 2025. Efficient Scientific Full Text Classification: The Case of EICAT Impact Assessments. In Proceedings of the 1st Workshop on Ecology, Environment, and Natural Language Processing (NLP4Ecology2025), pages 94–103, Tallinn, Estonia. University of Tartu Library.
Cite (Informal):
Efficient Scientific Full Text Classification: The Case of EICAT Impact Assessments (Brinner & Zarrieß, NLP4Ecology 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.nlp4ecology-1.20.pdf