Alberto Rodero Peña


2024

pdf
I2C-Huelva at SemEval-2024 Task 8: Boosting AI-Generated Text Detection with Multimodal Models and Optimized Ensembles
Alberto Rodero Peña | Jacinto Mata Vazquez | Victoria Pachón Álvarez
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

With the rise of AI-based text generators, the need for effective detection mechanisms has become paramount. This paper presents new techniques for building adaptable models and optimizing training aspects for identifying synthetically produced texts across multiple generators and domains. The study, divided into binary and multilabel classification tasks, avoids overfitting through strategic training data limitation. A key innovation is the incorporation of multimodal models that blend numerical text features with conventional NLP approaches. The work also delves into optimizing ensemble model combinations via various voting methods, focusing on accuracy as the official metric. The optimized ensemble strategy demonstrates significant efficacy in both subtasks, highlighting the potential of multimodal and ensemble methods in enhancing the robustness of detection systems against emerging text generators.