I2C-Huelva at SemEval-2024 Task 8: Boosting AI-Generated Text Detection with Multimodal Models and Optimized Ensembles
Alberto Rodero Peña, Jacinto Mata Vazquez, Victoria Pachón Álvarez
Abstract
With the rise of AI-based text generators, the need for effective detection mechanisms has become paramount. This paper presents new techniques for building adaptable models and optimizing training aspects for identifying synthetically produced texts across multiple generators and domains. The study, divided into binary and multilabel classification tasks, avoids overfitting through strategic training data limitation. A key innovation is the incorporation of multimodal models that blend numerical text features with conventional NLP approaches. The work also delves into optimizing ensemble model combinations via various voting methods, focusing on accuracy as the official metric. The optimized ensemble strategy demonstrates significant efficacy in both subtasks, highlighting the potential of multimodal and ensemble methods in enhancing the robustness of detection systems against emerging text generators.- Anthology ID:
- 2024.semeval-1.121
- Volume:
- Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 845–852
- Language:
- URL:
- https://aclanthology.org/2024.semeval-1.121
- DOI:
- 10.18653/v1/2024.semeval-1.121
- Cite (ACL):
- Alberto Rodero Peña, Jacinto Mata Vazquez, and Victoria Pachón Álvarez. 2024. I2C-Huelva at SemEval-2024 Task 8: Boosting AI-Generated Text Detection with Multimodal Models and Optimized Ensembles. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 845–852, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- I2C-Huelva at SemEval-2024 Task 8: Boosting AI-Generated Text Detection with Multimodal Models and Optimized Ensembles (Rodero Peña et al., SemEval 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.semeval-1.121.pdf