Abstract
This paper describes our approach to Subtask 1 “News Genre Categorization” of SemEval-2023 Task 3 “Detecting the Category, the Framing, and the Persuasion Techniques in Online News in a Multi-lingual Setup”, which aims to determine whether a given news article is an opinion piece, an objective report, or satirical. We fine-tuned the domain-specific language model POLITICS, which was pre-trained on a large-scale dataset of more than 3.6M English political news articles following ideology-driven pre-training objectives. In order to use it in the multilingual setup of the task, we added as a pre-processing step the translation of all documents into English. Our system ranked among the top systems overall in most language, and ranked 1st on the English dataset.- Anthology ID:
- 2023.semeval-1.14
- Volume:
- Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023)
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 108–113
- Language:
- URL:
- https://aclanthology.org/2023.semeval-1.14
- DOI:
- Cite (ACL):
- Nicolas Devatine, Philippe Muller, and Chloé Braud. 2023. MELODI at SemEval-2023 Task 3: In-domain Pre-training for Low-resource Classification of News Articles. In Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 108–113, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- MELODI at SemEval-2023 Task 3: In-domain Pre-training for Low-resource Classification of News Articles (Devatine et al., SemEval 2023)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2023.semeval-1.14.pdf