Dataset of News Articles with Provenance Metadata for Media Relevance Assessment

Tomas Peterka; Matyas Bohacek

Dataset of News Articles with Provenance Metadata for Media Relevance Assessment

Abstract

Out-of-context and misattributed imagery is the leading form of media manipulation in today’s misinformation and disinformation landscape. The existing methods attempting to detect this practice often only consider whether the semantics of the imagery corresponds to the text narrative, missing manipulation so long as the depicted objects or scenes somewhat correspond to the narrative at hand. To tackle this, we introduce News Media Provenance Dataset, a dataset of news articles with provenance-tagged images. We formulate two tasks on this dataset, location of origin relevance (LOR) and date and time of origin relevance (DTOR), and present baseline results on six large language models (LLMs). We identify that, while the zero-shot performance on LOR is promising, the performance on DTOR hinders, leaving room for specialized architectures and future work.

Anthology ID:: 2025.nlp4pi-1.10
Volume:: Proceedings of the Fourth Workshop on NLP for Positive Impact (NLP4PI)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Katherine Atwell, Laura Biester, Angana Borah, Daryna Dementieva, Oana Ignat, Neema Kotonya, Ziyi Liu, Ruyuan Wan, Steven Wilson, Jieyu Zhao
Venues:: NLP4PI | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 114–127
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.nlp4pi-1.10/
DOI:
Bibkey:
Cite (ACL):: Tomas Peterka and Matyas Bohacek. 2025. Dataset of News Articles with Provenance Metadata for Media Relevance Assessment. In Proceedings of the Fourth Workshop on NLP for Positive Impact (NLP4PI), pages 114–127, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Dataset of News Articles with Provenance Metadata for Media Relevance Assessment (Peterka & Bohacek, NLP4PI 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.nlp4pi-1.10.pdf

PDF Cite Search Fix data