Kara Schechtman


Comparing Approaches for Automatic Question Identification
Angel Maredia | Kara Schechtman | Sarah Ita Levitan | Julia Hirschberg
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

Collecting spontaneous speech corpora that are open-ended, yet topically constrained, is increasingly popular for research in spoken dialogue systems and speaker state, inter alia. Typically, these corpora are labeled by human annotators, either in the lab or through crowd-sourcing; however, this is cumbersome and time-consuming for large corpora. We present four different approaches to automatically tagging a corpus when general topics of the conversations are known. We develop these approaches on the Columbia X-Cultural Deception corpus and find accuracy that significantly exceeds the baseline. Finally, we conduct a cross-corpus evaluation by testing the best performing approach on the Columbia/SRI/Colorado corpus.