UR@NLP_A_Team @ GermEval 2021: Ensemble-based Classification of Toxic, Engaging and Fact-Claiming Comments

Kwabena Odame Akomeah; Udo Kruschwitz; Bernd Ludwig

UR@NLP_A_Team @ GermEval 2021: Ensemble-based Classification of Toxic, Engaging and Fact-Claiming Comments

Kwabena Odame Akomeah, Udo Kruschwitz, Bernd Ludwig

[How to correct problems with metadata yourself]

Abstract

In this paper, we report on our approach to addressing the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments for the German language. We submitted three runs for each subtask based on ensembles of three models each using contextual embeddings from pre-trained language models using SVM and neural-network-based classifiers. We include language-specific as well as language-agnostic language models – both with and without fine-tuning. We observe that for the runs we submitted that the SVM models overfitted the training data and this affected the aggregation method (simple majority voting) of the ensembles. The model records a lower performance on the test set than on the training set. Exploring the issue of overfitting we uncovered that due to a bug in the pipeline the runs we submitted had not been trained on the full set but only on a small training set. Therefore in this paper we also include the results we get when trained on the full training set which demonstrate the power of ensembles.

Anthology ID:: 2021.germeval-1.14
Volume:: Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments
Month:: September
Year:: 2021
Address:: Duesseldorf, Germany
Editors:: Julian Risch, Anke Stoll, Lena Wilms, Michael Wiegand
Venue:: GermEval
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 95–99
Language:
URL:: https://aclanthology.org/2021.germeval-1.14
DOI:
Bibkey:
Cite (ACL):: Kwabena Odame Akomeah, Udo Kruschwitz, and Bernd Ludwig. 2021. UR@NLP_A_Team @ GermEval 2021: Ensemble-based Classification of Toxic, Engaging and Fact-Claiming Comments. In Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments, pages 95–99, Duesseldorf, Germany. Association for Computational Linguistics.
Cite (Informal):: UR@NLP_A_Team @ GermEval 2021: Ensemble-based Classification of Toxic, Engaging and Fact-Claiming Comments (Akomeah et al., GermEval 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/teach-a-man-to-fish/2021.germeval-1.14.pdf

PDF Search