Follow the Knowledge: Structural Biases and Artefacts in Knowledge Grounded Dialog Datasets
Ehsan Lotfi, Maxime De Bruyn, Jeska.buhmann@uantwerpen.be Jeska.buhmann@uantwerpen.be, Walter Daelemans
Abstract
Crowd-sourcing has been one of the primary ways to curate conversational data, specially for certain scenarios like grounding in knowledge. In this setting, using online platforms like AMT, non-expert participants are hired to converse with each other, following instructions which try to guide the outcome towards the desired format. The resulting data then is used for different parts of dialog modelling like knowledge selection and response selection/generation. In this work, we take a closer look into two of the most popular knowledge grounded dialog (KGD) datasets. Investigating potential biases and artefacts in knowledge selection labels, we observe that in many cases the ‘knowledge selection flow’ simply follows the order of presented knowledge pieces. In Wizard of Wikipedia (the most popular KGD dataset) we use simple content-agnostic models based on this bias to get significant knowledge selection performance. In Topical-Chat we see a similar correlation between the knowledge selection sequence and the order of entities and their segments, as provided to crowd-source workers. We believe that the observed results, question the significance and origin of the presumed dialog-level attributes like ‘knowledge flow’ in these crowd-sourced datasets.- Anthology ID:
- 2023.dialdoc-1.12
- Volume:
- Proceedings of the Third DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Smaranda Muresan, Vivian Chen, Kennington Casey, Vandyke David, Dethlefs Nina, Inoue Koji, Ekstedt Erik, Ultes Stefan
- Venue:
- dialdoc
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 109–121
- Language:
- URL:
- https://aclanthology.org/2023.dialdoc-1.12
- DOI:
- 10.18653/v1/2023.dialdoc-1.12
- Cite (ACL):
- Ehsan Lotfi, Maxime De Bruyn, Jeska.buhmann@uantwerpen.be Jeska.buhmann@uantwerpen.be, and Walter Daelemans. 2023. Follow the Knowledge: Structural Biases and Artefacts in Knowledge Grounded Dialog Datasets. In Proceedings of the Third DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering, pages 109–121, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Follow the Knowledge: Structural Biases and Artefacts in Knowledge Grounded Dialog Datasets (Lotfi et al., dialdoc 2023)
- PDF:
- https://preview.aclanthology.org/improve-issue-templates/2023.dialdoc-1.12.pdf