Team MGTD4ADL at SemEval-2024 Task 8: Leveraging (Sentence) Transformer Models with Contrastive Learning for Identifying Machine-Generated Text

Huixin Chen; Jan Büssing; David Rügamer; Ercong Nie

doi:10.18653/v1/2024.semeval-1.245

Team MGTD4ADL at SemEval-2024 Task 8: Leveraging (Sentence) Transformer Models with Contrastive Learning for Identifying Machine-Generated Text

Huixin Chen, Jan Büssing, David Rügamer, Ercong Nie

Abstract

This paper outlines our approach to SemEval-2024 Task 8 (Subtask B), which focuses on discerning machine-generated text from human-written content, while also identifying the text sources, i.e., from which Large Language Model (LLM) the target text is generated. Our detection system is built upon Transformer-based techniques, leveraging various pre-trained language models (PLMs), including sentence transformer models. Additionally, we incorporate Contrastive Learning (CL) into the classifier to improve the detecting capabilities and employ Data Augmentation methods. Ultimately, our system achieves a peak accuracy of 76.96% on the test set of the competition, configured using a sentence transformer model integrated with CL methodology.

Anthology ID:: 2024.semeval-1.245
Volume:: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1711–1718
Language:
URL:: https://aclanthology.org/2024.semeval-1.245
DOI:: 10.18653/v1/2024.semeval-1.245
Bibkey:
Cite (ACL):: Huixin Chen, Jan Büssing, David Rügamer, and Ercong Nie. 2024. Team MGTD4ADL at SemEval-2024 Task 8: Leveraging (Sentence) Transformer Models with Contrastive Learning for Identifying Machine-Generated Text. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1711–1718, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: Team MGTD4ADL at SemEval-2024 Task 8: Leveraging (Sentence) Transformer Models with Contrastive Learning for Identifying Machine-Generated Text (Chen et al., SemEval 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-4/2024.semeval-1.245.pdf
Supplementary material:: 2024.semeval-1.245.SupplementaryMaterial.zip
Supplementary material:: 2024.semeval-1.245.SupplementaryMaterial.txt

PDF Search Supplementary material Supplementary material