Wikipedia is Not a Dictionary, Delete! Text Classification as a Proxy for Analysing Wiki Deletion Discussions

Hsuvas Borkakoty, Luis Espinosa-Anke


Abstract
Automated content moderation for collaborative knowledge hubs like Wikipedia or Wikidata is an important yet challenging task due to multiple factors. In this paper, we construct a database of discussions happening around articles marked for deletion in several Wikis and in three languages, which we then use to evaluate a range of LMs on different tasks (from predicting the outcome of the discussion to identifying the implicit policy an individual comment might be pointing to). Our results reveal, among others, that discussions leading to deletion are easier to predict, and that, surprisingly, self-produced tags (keep, delete or redirect) don’t always help guiding the classifiers, presumably because of users’ hesitation or deliberation within comments
Anthology ID:
2025.wnut-1.14
Volume:
Proceedings of the Tenth Workshop on Noisy and User-generated Text
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico, USA
Editors:
JinYeong Bak, Rob van der Goot, Hyeju Jang, Weerayut Buaphet, Alan Ramponi, Wei Xu, Alan Ritter
Venues:
WNUT | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
133–142
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.wnut-1.14/
DOI:
Bibkey:
Cite (ACL):
Hsuvas Borkakoty and Luis Espinosa-Anke. 2025. Wikipedia is Not a Dictionary, Delete! Text Classification as a Proxy for Analysing Wiki Deletion Discussions. In Proceedings of the Tenth Workshop on Noisy and User-generated Text, pages 133–142, Albuquerque, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Wikipedia is Not a Dictionary, Delete! Text Classification as a Proxy for Analysing Wiki Deletion Discussions (Borkakoty & Espinosa-Anke, WNUT 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.wnut-1.14.pdf