Hitachi at SemEval-2020 Task 11: An Empirical Study of Pre-Trained Transformer Family for Propaganda Detection
Gaku Morio, Terufumi Morishita, Hiroaki Ozaki, Toshinori Miyoshi
Abstract
In this paper, we show our system for SemEval-2020 task 11, where we tackle propaganda span identification (SI) and technique classification (TC). We investigate heterogeneous pre-trained language models (PLMs) such as BERT, GPT-2, XLNet, XLM, RoBERTa, and XLM-RoBERTa for SI and TC fine-tuning, respectively. In large-scale experiments, we found that each of the language models has a characteristic property, and using an ensemble model with them is promising. Finally, the ensemble model was ranked 1st amongst 35 teams for SI and 3rd amongst 31 teams for TC.- Anthology ID:
- 2020.semeval-1.228
- Volume:
- Proceedings of the Fourteenth Workshop on Semantic Evaluation
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona (online)
- Venue:
- SemEval
- SIGs:
- SIGLEX | SIGSEM
- Publisher:
- International Committee for Computational Linguistics
- Note:
- Pages:
- 1739–1748
- Language:
- URL:
- https://aclanthology.org/2020.semeval-1.228
- DOI:
- 10.18653/v1/2020.semeval-1.228
- Cite (ACL):
- Gaku Morio, Terufumi Morishita, Hiroaki Ozaki, and Toshinori Miyoshi. 2020. Hitachi at SemEval-2020 Task 11: An Empirical Study of Pre-Trained Transformer Family for Propaganda Detection. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1739–1748, Barcelona (online). International Committee for Computational Linguistics.
- Cite (Informal):
- Hitachi at SemEval-2020 Task 11: An Empirical Study of Pre-Trained Transformer Family for Propaganda Detection (Morio et al., SemEval 2020)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2020.semeval-1.228.pdf