Zero-shot Topical Text Classification with LLMs - an Experimental Study
Shai Gretz, Alon Halfon, Ilya Shnayderman, Orith Toledo-Ronen, Artem Spector, Lena Dankin, Yannis Katsis, Ofir Arviv, Yoav Katz, Noam Slonim, Liat Ein-Dor
Abstract
Topical Text Classification (TTC) is an ancient, yet timely research area in natural language processing, with many practical applications. The recent dramatic advancements in large LMs raise the question of how well these models can perform in this task in a zero-shot scenario. Here, we share a first comprehensive study, comparing the zero-shot performance of a variety of LMs over TTC23, a large benchmark collection of 23 publicly available TTC datasets, covering a wide range of domains and styles. In addition, we leverage this new TTC benchmark to create LMs that are specialized in TTC, by fine-tuning these LMs over a subset of the datasets and evaluating their performance over the remaining, held-out datasets. We show that the TTC-specialized LMs obtain the top performance on our benchmark, by a significant margin. Our code and model are made available for the community. We hope that the results presented in this work will serve as a useful guide for practitioners interested in topical text classification.- Anthology ID:
- 2023.findings-emnlp.647
- Original:
- 2023.findings-emnlp.647v1
- Version 2:
- 2023.findings-emnlp.647v2
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 9647–9676
- Language:
- URL:
- https://aclanthology.org/2023.findings-emnlp.647
- DOI:
- 10.18653/v1/2023.findings-emnlp.647
- Cite (ACL):
- Shai Gretz, Alon Halfon, Ilya Shnayderman, Orith Toledo-Ronen, Artem Spector, Lena Dankin, Yannis Katsis, Ofir Arviv, Yoav Katz, Noam Slonim, and Liat Ein-Dor. 2023. Zero-shot Topical Text Classification with LLMs - an Experimental Study. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9647–9676, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Zero-shot Topical Text Classification with LLMs - an Experimental Study (Gretz et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2023.findings-emnlp.647.pdf