Human vs Machine: An Automated Machine-Generated Text Detection Approach
Urwah Jawaid, Rudra Roy, Pritam Pal, Srijani Debnath, Dipankar Das, Sivaji Bandyopadhyay
Abstract
With the advancement of natural language processing (NLP) and sophisticated Large Language Models (LLMs), distinguishing between human-written texts and machine-generated texts is quite difficult nowadays. This paper presents a systematic approach to classifying machine-generated text from human-written text with a combination of the transformer-based model and textual feature-based post-processing technique. We extracted five textual features: readability score, stop word score, spelling and grammatical error count, unique word score and human phrase count from both human-written and machine-generated texts separately and trained three machine learning models (SVM, Random Forest and XGBoost) with these scores. Along with exploring traditional machine-learning models, we explored the BiLSTM and transformer-based distilBERT models to enhance the classification performance. By training and evaluating with a large dataset containing both human-written and machine-generated text, our best-performing framework achieves an accuracy of 87.5%.- Anthology ID:
- 2024.icon-1.24
- Volume:
- Proceedings of the 21st International Conference on Natural Language Processing (ICON)
- Month:
- December
- Year:
- 2024
- Address:
- AU-KBC Research Centre, Chennai, India
- Editors:
- Sobha Lalitha Devi, Karunesh Arora
- Venue:
- ICON
- SIG:
- Publisher:
- NLP Association of India (NLPAI)
- Note:
- Pages:
- 215–223
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2024.icon-1.24/
- DOI:
- Cite (ACL):
- Urwah Jawaid, Rudra Roy, Pritam Pal, Srijani Debnath, Dipankar Das, and Sivaji Bandyopadhyay. 2024. Human vs Machine: An Automated Machine-Generated Text Detection Approach. In Proceedings of the 21st International Conference on Natural Language Processing (ICON), pages 215–223, AU-KBC Research Centre, Chennai, India. NLP Association of India (NLPAI).
- Cite (Informal):
- Human vs Machine: An Automated Machine-Generated Text Detection Approach (Jawaid et al., ICON 2024)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2024.icon-1.24.pdf