Rudra Roy

2024

pdf bib abs
Human vs Machine: An Automated Machine-Generated Text Detection Approach
Urwah Jawaid | Rudra Roy | Pritam Pal | Srijani Debnath | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

With the advancement of natural language processing (NLP) and sophisticated Large Language Models (LLMs), distinguishing between human-written texts and machine-generated texts is quite difficult nowadays. This paper presents a systematic approach to classifying machine-generated text from human-written text with a combination of the transformer-based model and textual feature-based post-processing technique. We extracted five textual features: readability score, stop word score, spelling and grammatical error count, unique word score and human phrase count from both human-written and machine-generated texts separately and trained three machine learning models (SVM, Random Forest and XGBoost) with these scores. Along with exploring traditional machine-learning models, we explored the BiLSTM and transformer-based distilBERT models to enhance the classification performance. By training and evaluating with a large dataset containing both human-written and machine-generated text, our best-performing framework achieves an accuracy of 87.5%.

Co-authors

Venues

icon1

Fix data

Rudra Roy

Fixing paper assignments

2024

Co-authors

Venues