GeneFRDebate: Generated French Debates from News Articles with Industrial-Expert Summaries

Rim Abrougui, Guillaume Lechien, Elisabeth Savatier, Benoît Laurent


Abstract
Summarizing domain-specific conversations, such as political debates, remains challenging despite advances in large language models (LLMs), and resources for French debates are particularly limited. We present GeneFRDebate, a new dataset of synthetic French political debates generated from real-world news articles using an LLM, while keeping expert-written summaries unchanged. Our pipeline combines prompt engineering, human curation, and quality evaluation using both automatic metrics and expert assessment. We also provide baseline experiments with small-scale LLMs (≤8B parameters), demonstrating the dataset’s usefulness for training and evaluation. This work shows that carefully generated synthetic data with human oversight can complement existing corpora, supporting research in multilingual and domain-specific dialogue summarization.
Anthology ID:
2026.lrec-main.143
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
1831–1841
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.143/
DOI:
Bibkey:
Cite (ACL):
Rim Abrougui, Guillaume Lechien, Elisabeth Savatier, and Benoît Laurent. 2026. GeneFRDebate: Generated French Debates from News Articles with Industrial-Expert Summaries. International Conference on Language Resources and Evaluation, main:1831–1841.
Cite (Informal):
GeneFRDebate: Generated French Debates from News Articles with Industrial-Expert Summaries (Abrougui et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.143.pdf