Carsten Wuensch
2014
PACE Corpus: a multilingual corpus of Polarity-annotated textual data from the domains Automotive and CEllphone
Christian Haenig
|
Andreas Niekler
|
Carsten Wuensch
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
In this paper, we describe a publicly available multilingual evaluation corpus for phrase-level Sentiment Analysis that can be used to evaluate real world applications in an industrial context. This corpus contains data from English and German Internet forums (1000 posts each) focusing on the automotive domain. The major topic of the corpus is connecting and using cellphones to/in cars. The presented corpus contains different types of annotations: objects (e.g. my car, my new cellphone), features (e.g. address book, sound quality) and phrase-level polarities (e.g. the best possible automobile, big problem). Each of the posts has been annotated by at least four different annotators ― these annotations are retained in their original form. The reliability of the annotations is evaluated by inter-annotator agreement scores. Besides the corpus data and format, we provide comprehensive corpus statistics. This corpus is one of the first lexical resources focusing on real world applications that analyze the voice of the customer which is crucial for various industrial use cases.