Werkzeug at SemEval-2024 Task 8: LLM-Generated Text Detection via Gated Mixture-of-Experts Fine-Tuning

Youlin Wu, Kaichun Wang, Kai Ma, Liang Yang, Hongfei Lin


Abstract
Recent advancements in Large Language Models (LLMs) have propelled text generation to unprecedented heights, approaching human-level quality. However, it poses a new challenge to distinguish LLM-generated text from human-written text. Presently, most methods address this issue through classification, achieved by fine-tuning on small language models. Unfortunately, small language models suffer from anisotropy issue, where encoded text embeddings become difficult to differentiate in the latent space. Moreover, LLMs possess the ability to alter language styles with versatility, further complicating the classification task. To tackle these challenges, we propose Gated Mixture-of-Experts Fine-tuning (GMoEF) to detect LLM-generated text. GMoEF leverages parametric whitening to normalize text embeddings, thereby mitigating the anisotropy problem. Additionally, GMoEF employs the mixture-of-experts framework equipped with gating router to capture features of LLM-generated text from multiple perspectives. Our GMoEF achieved an impressive ranking of #8 out of 70 teams. The source code is available on https://gitlab.com/sigrs/gmoef.
Anthology ID:
2024.semeval-1.82
Volume:
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
547–552
Language:
URL:
https://aclanthology.org/2024.semeval-1.82
DOI:
Bibkey:
Cite (ACL):
Youlin Wu, Kaichun Wang, Kai Ma, Liang Yang, and Hongfei Lin. 2024. Werkzeug at SemEval-2024 Task 8: LLM-Generated Text Detection via Gated Mixture-of-Experts Fine-Tuning. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 547–552, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Werkzeug at SemEval-2024 Task 8: LLM-Generated Text Detection via Gated Mixture-of-Experts Fine-Tuning (Wu et al., SemEval 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/revert-3132-ingestion-checklist/2024.semeval-1.82.pdf
Supplementary material:
 2024.semeval-1.82.SupplementaryMaterial.txt