MedDialog-FR: A French Version of the MedDialog Corpus for Multi-label Classification and Response Generation Related to Women’s Intimate Health

Xingyu Liu, Vincent Segonne, Aidan Mannion, Didier Schwab, Lorraine Goeuriot, François Portet


Abstract
This article presents MedDialog-FR, a large publicly available corpus of French medical conversations for the medical domain. Motivated by the lack of French dialogue corpora for data-driven dialogue systems and the paucity of available information related to women’s intimate health, we introduce an annotated corpus of question-and-answer dialogues between a real patient and a real doctor concerning women’s intimate health. The corpus is composed of about 20,000 dialogues automatically translated from the English version of MedDialog-EN. The corpus test set is composed of 1,400 dialogues that have been manually post-edited and annotated with 22 categories from the UMLS ontology. We also fine-tuned state-of-the-art reference models to automatically perform multi-label classification and response generation to give an initial performance benchmark and highlight the difficulty of the tasks.
Anthology ID:
2024.cl4health-1.21
Volume:
Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Dina Demner-Fushman, Sophia Ananiadou, Paul Thompson, Brian Ondov
Venues:
CL4Health | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
173–183
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/2024.cl4health-1.21/
DOI:
Bibkey:
Cite (ACL):
Xingyu Liu, Vincent Segonne, Aidan Mannion, Didier Schwab, Lorraine Goeuriot, and François Portet. 2024. MedDialog-FR: A French Version of the MedDialog Corpus for Multi-label Classification and Response Generation Related to Women’s Intimate Health. In Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024, pages 173–183, Torino, Italia. ELRA and ICCL.
Cite (Informal):
MedDialog-FR: A French Version of the MedDialog Corpus for Multi-label Classification and Response Generation Related to Women’s Intimate Health (Liu et al., CL4Health 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/2024.cl4health-1.21.pdf