The Chronicles of RiDiC: Generating Datasets with Controlled Popularity Distribution for Long-form Factuality Evaluation

Pavel Braslavski, Dmitrii Iarosh, Nikita Sergeevich Sushko, Andrey Sakhovskiy, Vasily Konovalov, Elena Tutubalina, Alexander Panchenko


Abstract
We present a configurable pipeline and the associated code that can be used to generate multilingual sets of entities with specified characteristics, such as domain, geographical location and popularity, using data from Wikipedia and Wikidata. These datasets are intended for evaluating the factuality of LLMs’ long-form generation, thereby complementing evaluation based on short-form QA datasets. We present the RiDiC dataset as an example of this approach. RiDiC contains 3,000 entities from three domains – rivers, natural disasters, and car models – spanning different popularity tiers. Each entity is accompanied by its geographical location, English and Chinese names (if available) and relevant English and Chinese Wikipedia content, which is used to evaluate LLMs’ responses. Generations about RiDiC entities were obtained from three LLMs in English and Chinese. These were then evaluated using a third-party factuality checker, which showed that entities from our dataset caused even frontier models to hallucinate. The code, data and generation/evaluation scripts have been released to enable the approach to be extended to new LLMs, languages and domains.
Anthology ID:
2026.lrec-main.776
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
9893–9904
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.776/
DOI:
Bibkey:
Cite (ACL):
Pavel Braslavski, Dmitrii Iarosh, Nikita Sergeevich Sushko, Andrey Sakhovskiy, Vasily Konovalov, Elena Tutubalina, and Alexander Panchenko. 2026. The Chronicles of RiDiC: Generating Datasets with Controlled Popularity Distribution for Long-form Factuality Evaluation. International Conference on Language Resources and Evaluation, main:9893–9904.
Cite (Informal):
The Chronicles of RiDiC: Generating Datasets with Controlled Popularity Distribution for Long-form Factuality Evaluation (Braslavski et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.776.pdf