A Large-Scale Corpus of E-mail Conversations with Standard and Two-Level Dialogue Act Annotations
Motoki Taniguchi, Yoshihiro Ueda, Tomoki Taniguchi, Tomoko Ohkuma
Abstract
We present a large-scale corpus of e-mail conversations with domain-agnostic and two-level dialogue act (DA) annotations towards the goal of a better understanding of asynchronous conversations. We annotate over 6,000 messages and 35,000 sentences from more than 2,000 threads. For a domain-independent and application-independent DA annotations, we choose ISO standard 24617-2 as the annotation scheme. To assess the difficulty of DA recognition on our corpus, we evaluate several models, including a pre-trained contextual representation model, as our baselines. The experimental results show that BERT outperforms other neural network models, including previous state-of-the-art models, but falls short of a human performance. We also demonstrate that DA tags of two-level granularity enable a DA recognition model to learn efficiently by using multi-task learning. An evaluation of a model trained on our corpus against other domains of asynchronous conversation reveals the domain independence of our DA annotations.- Anthology ID:
- 2020.coling-main.436
- Volume:
- Proceedings of the 28th International Conference on Computational Linguistics
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Editors:
- Donia Scott, Nuria Bel, Chengqing Zong
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 4969–4980
- Language:
- URL:
- https://aclanthology.org/2020.coling-main.436
- DOI:
- 10.18653/v1/2020.coling-main.436
- Cite (ACL):
- Motoki Taniguchi, Yoshihiro Ueda, Tomoki Taniguchi, and Tomoko Ohkuma. 2020. A Large-Scale Corpus of E-mail Conversations with Standard and Two-Level Dialogue Act Annotations. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4969–4980, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Cite (Informal):
- A Large-Scale Corpus of E-mail Conversations with Standard and Two-Level Dialogue Act Annotations (Taniguchi et al., COLING 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2020.coling-main.436.pdf