MELODI at SemEval-2023 Task 3: In-domain Pre-training for Low-resource Classification of News Articles

Nicolas Devatine, Philippe Muller, Chloé Braud


Abstract
This paper describes our approach to Subtask 1 “News Genre Categorization” of SemEval-2023 Task 3 “Detecting the Category, the Framing, and the Persuasion Techniques in Online News in a Multi-lingual Setup”, which aims to determine whether a given news article is an opinion piece, an objective report, or satirical. We fine-tuned the domain-specific language model POLITICS, which was pre-trained on a large-scale dataset of more than 3.6M English political news articles following ideology-driven pre-training objectives. In order to use it in the multilingual setup of the task, we added as a pre-processing step the translation of all documents into English. Our system ranked among the top systems overall in most language, and ranked 1st on the English dataset.
Anthology ID:
2023.semeval-1.14
Volume:
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Giovanni Da San Martino, Harish Tayyar Madabushi, Ritesh Kumar, Elisa Sartori
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
108–113
Language:
URL:
https://aclanthology.org/2023.semeval-1.14
DOI:
10.18653/v1/2023.semeval-1.14
Bibkey:
Cite (ACL):
Nicolas Devatine, Philippe Muller, and Chloé Braud. 2023. MELODI at SemEval-2023 Task 3: In-domain Pre-training for Low-resource Classification of News Articles. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 108–113, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
MELODI at SemEval-2023 Task 3: In-domain Pre-training for Low-resource Classification of News Articles (Devatine et al., SemEval 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/2023.semeval-1.14.pdf
Video:
 https://preview.aclanthology.org/dois-2013-emnlp/2023.semeval-1.14.mp4