Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions

Qian Ruan, Ilia Kuznetsov, Iryna Gurevych


Abstract
Classification is a core NLP task architecture with many potential applications. While large language models (LLMs) have brought substantial advancements in text generation, their potential for enhancing classification tasks remains underexplored. To address this gap, we propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches. We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task. Our extensive experiments and systematic comparisons with various training approaches and a representative selection of LLMs yield new insights into their application for EIC. We investigate the generalizability of these findings on five further classification tasks. To demonstrate the proposed methods and address the data shortage for empirical edit analysis, we use our best-performing EIC model to create Re3-Sci2.0, a new large-scale dataset of 1,780 scientific document revisions with over 94k labeled edits. The quality of the dataset is assessed through human evaluation. The new dataset enables an in-depth empirical study of human editing behavior in academic writing. We make our experimental framework, models and data publicly available.
Anthology ID:
2024.emnlp-main.839
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15049–15067
Language:
URL:
https://preview.aclanthology.org/add_missing_videos/2024.emnlp-main.839/
DOI:
10.18653/v1/2024.emnlp-main.839
Bibkey:
Cite (ACL):
Qian Ruan, Ilia Kuznetsov, and Iryna Gurevych. 2024. Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 15049–15067, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions (Ruan et al., EMNLP 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/add_missing_videos/2024.emnlp-main.839.pdf
Software:
 2024.emnlp-main.839.software.zip
Data:
 2024.emnlp-main.839.data.zip