MELODI at SemEval-2023 Task 3: In-domain Pre-training for Low-resource Classification of News Articles

Nicolas Devatine, Philippe Muller, Chloé Braud


Abstract
This paper describes our approach to Subtask 1 “News Genre Categorization” of SemEval-2023 Task 3 “Detecting the Category, the Framing, and the Persuasion Techniques in Online News in a Multi-lingual Setup”, which aims to determine whether a given news article is an opinion piece, an objective report, or satirical. We fine-tuned the domain-specific language model POLITICS, which was pre-trained on a large-scale dataset of more than 3.6M English political news articles following ideology-driven pre-training objectives. In order to use it in the multilingual setup of the task, we added as a pre-processing step the translation of all documents into English. Our system ranked among the top systems overall in most language, and ranked 1st on the English dataset.
Anthology ID:
2023.semeval-1.14
Volume:
Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
108–113
Language:
URL:
https://aclanthology.org/2023.semeval-1.14
DOI:
Bibkey:
Cite (ACL):
Nicolas Devatine, Philippe Muller, and Chloé Braud. 2023. MELODI at SemEval-2023 Task 3: In-domain Pre-training for Low-resource Classification of News Articles. In Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 108–113, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
MELODI at SemEval-2023 Task 3: In-domain Pre-training for Low-resource Classification of News Articles (Devatine et al., SemEval 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/paclic-22-ingestion/2023.semeval-1.14.pdf