Abstract
We present a comprehensive survey of available corpora for multi-party dialogue. We survey over 300 publications related to multi-party dialogue and catalogue all available corpora in a novel taxonomy. We analyze methods of data collection for multi-party dialogue corpora and identify several lacunae in existing data collection approaches used to collect such dialogue. We present this survey, the first survey to focus exclusively on multi-party dialogue corpora, to motivate research in this area. Through our discussion of existing data collection methods, we identify desiderata and guiding principles for multi-party data collection to contribute further towards advancing this area of dialogue research.- Anthology ID:
- 2021.sigdial-1.36
- Volume:
- Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue
- Month:
- July
- Year:
- 2021
- Address:
- Singapore and Online
- Venue:
- SIGDIAL
- SIG:
- SIGDIAL
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 338–352
- Language:
- URL:
- https://aclanthology.org/2021.sigdial-1.36
- DOI:
- Cite (ACL):
- Khyati Mahajan and Samira Shaikh. 2021. On the Need for Thoughtful Data Collection for Multi-Party Dialogue: A Survey of Available Corpora and Collection Methods. In Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 338–352, Singapore and Online. Association for Computational Linguistics.
- Cite (Informal):
- On the Need for Thoughtful Data Collection for Multi-Party Dialogue: A Survey of Available Corpora and Collection Methods (Mahajan & Shaikh, SIGDIAL 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.sigdial-1.36.pdf
- Data
- CRD3, Interview, MELD, Molweni, OpenSubtitles, Serial Speakers