AISPACE at SemEval-2024 task 8: A Class-balanced Soft-voting System for Detecting Multi-generator Machine-generated Text

Renhua Gu; Xiangfeng Meng

doi:10.18653/v1/2024.semeval-1.212

AISPACE at SemEval-2024 task 8: A Class-balanced Soft-voting System for Detecting Multi-generator Machine-generated Text

Abstract

SemEval-2024 Task 8 provides a challenge to detect human-written and machine-generated text. There are 3 subtasks for different detection scenarios. This paper proposes a system that mainly deals with Subtask B. It aims to detect if given full text is written by human or is generated by a specific Large Language Model (LLM), which is actually a multi-class text classification task. Our team AISPACE conducted a systematic study of fine-tuning transformer-based models, including encoder-only, decoder-only and encoder-decoder models. We compared their performance on this task and identified that encoder-only models performed exceptionally well. We also applied a weighted Cross Entropy loss function to address the issue of data imbalance of different class samples. Additionally, we employed soft-voting strategy over multi-models ensemble to enhance the reliability of our predictions. Our system ranked top 1 in Subtask B, which sets a state-of-the-art benchmark for this new challenge.

Anthology ID:: 2024.semeval-1.212
Volume:: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1476–1481
Language:
URL:: https://aclanthology.org/2024.semeval-1.212
DOI:: 10.18653/v1/2024.semeval-1.212
Bibkey:
Cite (ACL):: Renhua Gu and Xiangfeng Meng. 2024. AISPACE at SemEval-2024 task 8: A Class-balanced Soft-voting System for Detecting Multi-generator Machine-generated Text. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1476–1481, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: AISPACE at SemEval-2024 task 8: A Class-balanced Soft-voting System for Detecting Multi-generator Machine-generated Text (Gu & Meng, SemEval 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-4/2024.semeval-1.212.pdf
Supplementary material:: 2024.semeval-1.212.SupplementaryMaterial.txt

PDF Search Supplementary material