Chat Disentanglement: Data for New Domains and Methods for More Accurate Annotation
Sai R. Gouravajhala, Andrew M. Vernier, Yiming Shi, Zihan Li, Mark S. Ackerman, Jonathan K. Kummerfeld
Abstract
Conversation disentanglement is the task of taking a log of intertwined conversations from a shared channel and breaking the log into individual conversations. The standard datasets for disentanglement are in a single domain and were annotated by linguistics experts with careful training for the task. In this paper, we introduce the first multi-domain dataset and a study of annotation by people without linguistics expertise or extensive training. We experiment with several variations in interfaces, conducting user studies with domain experts and crowd workers. We also test a hypothesis from prior work that link-based annotation is more accurate, finding that it actually has comparable accuracy to set-based annotation. Our new dataset will support the development of more useful systems for this task, and our experimental findings suggest that users are capable of improving the usefulness of these systems by accurately annotating their own data.- Anthology ID:
- 2023.alta-1.12
- Volume:
- Proceedings of the 21st Annual Workshop of the Australasian Language Technology Association
- Month:
- November
- Year:
- 2023
- Address:
- Melbourne, Australia
- Editors:
- Smaranda Muresan, Vivian Chen, Kennington Casey, Vandyke David, Dethlefs Nina, Inoue Koji, Ekstedt Erik, Ultes Stefan
- Venue:
- ALTA
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 112–117
- Language:
- URL:
- https://aclanthology.org/2023.alta-1.12
- DOI:
- Cite (ACL):
- Sai R. Gouravajhala, Andrew M. Vernier, Yiming Shi, Zihan Li, Mark S. Ackerman, and Jonathan K. Kummerfeld. 2023. Chat Disentanglement: Data for New Domains and Methods for More Accurate Annotation. In Proceedings of the 21st Annual Workshop of the Australasian Language Technology Association, pages 112–117, Melbourne, Australia. Association for Computational Linguistics.
- Cite (Informal):
- Chat Disentanglement: Data for New Domains and Methods for More Accurate Annotation (Gouravajhala et al., ALTA 2023)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2023.alta-1.12.pdf