MPC: A Multi-Party Chat Corpus for Modeling Social Phenomena in Discourse

Samira Shaikh, Tomek Strzalkowski, Aaron Broadwell, Jennifer Stromer-Galley, Sarah Taylor, Nick Webb


Abstract
In this paper, we describe our experience with collecting and creating an annotated corpus of multi-party online conversations in a chat-room environment. This effort is part of a larger project to develop computational models of social phenomena such as agenda control, influence, and leadership in on-line interactions. Such models will help capturing the dialogue dynamics that are essential for developing, among others, realistic human-machine dialogue systems, including autonomous virtual chat agents. In this paper we describe data collection method used and the characteristics of the initial dataset of English chat. We have devised a multi-tiered collection process in which the subjects start from simple, free-flowing conversations and progress towards more complex and structured interactions. In this paper, we report on the first two stages of this process, which were recently completed. The third, large-scale collection effort is currently being conducted. All English dialogue has been annotated at four levels: communication links, dialogue acts, local topics and meso-topics. Some details of these annotations will be discussed later in this paper, although a full description is impossible within the scope of this article.
Anthology ID:
L10-1050
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/85_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Samira Shaikh, Tomek Strzalkowski, Aaron Broadwell, Jennifer Stromer-Galley, Sarah Taylor, and Nick Webb. 2010. MPC: A Multi-Party Chat Corpus for Modeling Social Phenomena in Discourse. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
MPC: A Multi-Party Chat Corpus for Modeling Social Phenomena in Discourse (Shaikh et al., LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/85_Paper.pdf