MedDialog-FR: A French Version of the MedDialog Corpus for Multi-label Classification and Response Generation Related to Women’s Intimate Health
Xingyu Liu, Vincent Segonne, Aidan Mannion, Didier Schwab, Lorraine Goeuriot, François Portet
Abstract
This article presents MedDialog-FR, a large publicly available corpus of French medical conversations for the medical domain. Motivated by the lack of French dialogue corpora for data-driven dialogue systems and the paucity of available information related to women’s intimate health, we introduce an annotated corpus of question-and-answer dialogues between a real patient and a real doctor concerning women’s intimate health. The corpus is composed of about 20,000 dialogues automatically translated from the English version of MedDialog-EN. The corpus test set is composed of 1,400 dialogues that have been manually post-edited and annotated with 22 categories from the UMLS ontology. We also fine-tuned state-of-the-art reference models to automatically perform multi-label classification and response generation to give an initial performance benchmark and highlight the difficulty of the tasks.- Anthology ID:
- 2024.cl4health-1.21
- Volume:
- Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Dina Demner-Fushman, Sophia Ananiadou, Paul Thompson, Brian Ondov
- Venues:
- CL4Health | WS
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 173–183
- Language:
- URL:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2024.cl4health-1.21/
- DOI:
- Cite (ACL):
- Xingyu Liu, Vincent Segonne, Aidan Mannion, Didier Schwab, Lorraine Goeuriot, and François Portet. 2024. MedDialog-FR: A French Version of the MedDialog Corpus for Multi-label Classification and Response Generation Related to Women’s Intimate Health. In Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024, pages 173–183, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- MedDialog-FR: A French Version of the MedDialog Corpus for Multi-label Classification and Response Generation Related to Women’s Intimate Health (Liu et al., CL4Health 2024)
- PDF:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2024.cl4health-1.21.pdf