Gaetano Cimino
2026
MIMIC: Multi-party Dialogue Augmentation via Speaker Stylistic Transfer
Gaetano Cimino | Giuseppe Carenini | Vincenzo Deufemia
Findings of the Association for Computational Linguistics: EACL 2026
Gaetano Cimino | Giuseppe Carenini | Vincenzo Deufemia
Findings of the Association for Computational Linguistics: EACL 2026
Annotated data scarcity has long hindered progress in dialogue discourse parsing. To fill this gap, we introduce MIMIC, a framework for augmenting discourse-annotated corpora via speaker stylistic transfer using Large Language Models (LLMs). MIMIC rephrases utterances while preserving discourse coherence, using the MASK metric to identify speakers for replacement that enrich structural diversity and the MIRROR method to select substitute speakers who have experienced similar discourse interactions. Experimental results on STAC and Molweni corpora show that parsers trained with MIMIC-augmented data improve both link prediction and relation classification, with consistent gains for underrepresented discourse patterns and in low-resource scenarios.
Meta-Prompting Follow-Ups for Unsupervised Dialogue Evaluation Using Open-Source Large Language Models
Gaetano Cimino | Chuyuan Li | Giuseppe Carenini | Vincenzo Deufemia
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Gaetano Cimino | Chuyuan Li | Giuseppe Carenini | Vincenzo Deufemia
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Automatically evaluating dialogue quality remains a major challenge due to the complexity and contextual variability of human interactions. This paper introduces DIET, a novel unsupervised, reference-free metric that uses follow-up utterances to assess dialogue quality. Unlike existing reference-free metrics, which rely on follow-ups derived from annotated data and apply a uniform set of utterances across all dialogues, DIET generates follow-ups using open-source Large Language Models (LLMs) and refines them through a selection process. Two strategies are explored: SELFMAP, where generation and evaluation are performed by the same model to ensure internal coherence, and CRAFT, where multiple models collaborate to generate diverse and complementary follow-ups, enhancing robustness and reducing model bias. Dialogue quality is measured via the likelihood of an LLM continuing the dialogue from selected follow-ups. Experiments show DIET better correlates with human judgments than existing reference-free metrics across multiple meta-evaluation datasets.
2025
Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems
Ryan Whetten | Virgile Sucal | Anh Ngo | Kranti Chalamalasetti | Koji Inoue | Gaetano Cimino | Zachary Yang | Yuki Zenimoto | Ricardo Rodriguez
Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems
Ryan Whetten | Virgile Sucal | Anh Ngo | Kranti Chalamalasetti | Koji Inoue | Gaetano Cimino | Zachary Yang | Yuki Zenimoto | Ricardo Rodriguez
Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems
2024
Coherence-based Dialogue Discourse Structure Extraction using Open-Source Large Language Models
Gaetano Cimino | Chuyuan Li | Giuseppe Carenini | Vincenzo Deufemia
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Gaetano Cimino | Chuyuan Li | Giuseppe Carenini | Vincenzo Deufemia
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Despite the challenges posed by data sparsity in discourse parsing for dialogues, unsupervised methods have been underexplored. Leveraging recent advances in Large Language Models (LLMs), in this paper we investigate an unsupervised coherence-based method to build discourse structures for multi-party dialogues using open-source LLMs fine-tuned on conversational data. Specifically, we propose two algorithms that extract dialogue structures by identifying their most coherent sub-dialogues: DS-DP employs a dynamic programming strategy, while DS-FLOW applies a greedy approach. Evaluation on the STAC corpus demonstrates a micro-F1 score of 58.1%, surpassing prior unsupervised methods. Furthermore, on a cleaned subset of the Molweni corpus, the proposed method achieves a micro-F1 score of 74.7%, highlighting its effectiveness across different corpora.