AISPACE at SemEval-2024 task 8: A Class-balanced Soft-voting System for Detecting Multi-generator Machine-generated Text

Renhua Gu, Xiangfeng Meng


Abstract
SemEval-2024 Task 8 provides a challenge to detect human-written and machine-generated text. There are 3 subtasks for different detection scenarios. This paper proposes a system that mainly deals with Subtask B. It aims to detect if given full text is written by human or is generated by a specific Large Language Model (LLM), which is actually a multi-class text classification task. Our team AISPACE conducted a systematic study of fine-tuning transformer-based models, including encoder-only, decoder-only and encoder-decoder models. We compared their performance on this task and identified that encoder-only models performed exceptionally well. We also applied a weighted Cross Entropy loss function to address the issue of data imbalance of different class samples. Additionally, we employed soft-voting strategy over multi-models ensemble to enhance the reliability of our predictions. Our system ranked top 1 in Subtask B, which sets a state-of-the-art benchmark for this new challenge.
Anthology ID:
2024.semeval-1.212
Volume:
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
1476–1481
Language:
URL:
https://aclanthology.org/2024.semeval-1.212
DOI:
Bibkey:
Cite (ACL):
Renhua Gu and Xiangfeng Meng. 2024. AISPACE at SemEval-2024 task 8: A Class-balanced Soft-voting System for Detecting Multi-generator Machine-generated Text. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1476–1481, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
AISPACE at SemEval-2024 task 8: A Class-balanced Soft-voting System for Detecting Multi-generator Machine-generated Text (Gu & Meng, SemEval 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.semeval-1.212.pdf
Supplementary material:
 2024.semeval-1.212.SupplementaryMaterial.txt