MELODI at SemEval-2023 Task 3: In-domain Pre-training for Low-resource Classification of News Articles

Nicolas Devatine; Philippe Muller; Chloé Braud

doi:10.18653/v1/2023.semeval-1.14

MELODI at SemEval-2023 Task 3: In-domain Pre-training for Low-resource Classification of News Articles

Nicolas Devatine, Philippe Muller, Chloé Braud

Abstract

This paper describes our approach to Subtask 1 “News Genre Categorization” of SemEval-2023 Task 3 “Detecting the Category, the Framing, and the Persuasion Techniques in Online News in a Multi-lingual Setup”, which aims to determine whether a given news article is an opinion piece, an objective report, or satirical. We fine-tuned the domain-specific language model POLITICS, which was pre-trained on a large-scale dataset of more than 3.6M English political news articles following ideology-driven pre-training objectives. In order to use it in the multilingual setup of the task, we added as a pre-processing step the translation of all documents into English. Our system ranked among the top systems overall in most language, and ranked 1st on the English dataset.

Anthology ID:: 2023.semeval-1.14
Volume:: Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Atul Kr. Ojha, A. Seza Doğruöz, Giovanni Da San Martino, Harish Tayyar Madabushi, Ritesh Kumar, Elisa Sartori
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 108–113
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2023.semeval-1.14/
DOI:: 10.18653/v1/2023.semeval-1.14
Bibkey:
Cite (ACL):: Nicolas Devatine, Philippe Muller, and Chloé Braud. 2023. MELODI at SemEval-2023 Task 3: In-domain Pre-training for Low-resource Classification of News Articles. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 108–113, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: MELODI at SemEval-2023 Task 3: In-domain Pre-training for Low-resource Classification of News Articles (Devatine et al., SemEval 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2023.semeval-1.14.pdf
Video:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2023.semeval-1.14.mp4

PDF Cite Search Video Fix data