Elisabeth Savatier

2026

Can Small-Scale LLMs Balance Content Accuracy and Speaker Faithfulness in Noisy French Dialogue Summarization?
Rim Abrougui | Guillaume Lechien | Elisabeth Savatier | Benoît Laurent
Proceedings of the 16th International Workshop on Spoken Dialogue System Technology

Summarizing domain-specific and multi-speaker conversations, such as political debates, remains challenging under noisy ASR conditions. In industrial contexts, large language models (LLMs) are often impractical due to resource and confidentiality constraints. This work evaluates whether smaller LLMs (up to 8B parameters) can produce reliable summaries in such settings. Experiments on French debates show that noise significantly degrades accuracy and readability, while fine-tuning on clean, domain-related data improves robustness and reduces hallucinations. We also analyze person-name mentions as indicators of speaker faithfulness, finding that fine-tuning can help identify all speakers in far more debates than chain-of-thought prompting. However, evaluations on limited industrial data show that fine-tuning still struggles to generalize to unseen speakers and topics.

pdf bib abs

GeneFRDebate: Generated French Debates from News Articles with Industrial-Expert Summaries
Rim Abrougui | Guillaume Lechien | Elisabeth Savatier | Benoît Laurent
Proceedings of the Fifteenth Language Resources and Evaluation Conference

Summarizing domain-specific conversations, such as political debates, remains challenging despite advances in large language models (LLMs), and resources for French debates are particularly limited. We present GeneFRDebate, a new dataset of synthetic French political debates generated from real-world news articles using an LLM, while keeping expert-written summaries unchanged. Our pipeline combines prompt engineering, human curation, and quality evaluation using both automatic metrics and expert assessment. We also provide baseline experiments with small-scale LLMs (≤8B parameters), demonstrating the dataset’s usefulness for training and evaluation. This work shows that carefully generated synthetic data with human oversight can complement existing corpora, supporting research in multilingual and domain-specific dialogue summarization.

Co-authors

Venues

IWSDS1
LREC1

Fix author