Massimiliano Pronesti


2024

pdf
Filling Gaps in Wikipedia: Leveraging Data-to-Text Generation to Improve Encyclopedic Coverage of Underrepresented Groups
Simon Mille | Massimiliano Pronesti | Craig Thomson | Michela Lorandi | Sophie Fitzpatrick | Rudali Huidrom | Mohammed Sabry | Amy O’Riordan | Anya Belz
Proceedings of the 17th International Natural Language Generation Conference: System Demonstrations

Wikipedia is known to have systematic gaps in its coverage that correspond to under-resourced languages as well as underrepresented groups. This paper presents a new tool to support efforts to fill in these gaps by automatically generating draft articles and facilitating post-editing and uploading to Wikipedia. A rule-based generator and an input-constrained LLM are used to generate two alternative articles, enabling the often more fluent, but error-prone, LLM-generated article to be content-checked against the more reliable, but less fluent, rule-generated article.