Abstract
Automatically organizing scholarly literature is a necessary and challenging task. By assigning scientific research publications key concepts, researchers, policymakers, and the general public are able to search for and discover relevant research literature. The organization of scientific research evolves with new discoveries and publications, requiring an up-to-date and scalable text classification model. Additionally, scientific research publications benefit from multi-label classification, particularly with more fine-grained sub-domains. Prior work has focused on classifying scientific publications from one research area (e.g., computer science), referencing static concept descriptions, and implementing an English-only classification model. We propose a multi-label classification model that can be implemented in non-English languages, across all of scientific literature, with updatable concept descriptions.- Anthology ID:
- 2022.sdp-1.12
- Volume:
- Proceedings of the Third Workshop on Scholarly Document Processing
- Month:
- October
- Year:
- 2022
- Address:
- Gyeongju, Republic of Korea
- Venue:
- sdp
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 105–114
- Language:
- URL:
- https://aclanthology.org/2022.sdp-1.12
- DOI:
- Cite (ACL):
- Autumn Toney and James Dunham. 2022. Multi-label Classification of Scientific Research Documents Across Domains and Languages. In Proceedings of the Third Workshop on Scholarly Document Processing, pages 105–114, Gyeongju, Republic of Korea. Association for Computational Linguistics.
- Cite (Informal):
- Multi-label Classification of Scientific Research Documents Across Domains and Languages (Toney & Dunham, sdp 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.sdp-1.12.pdf