PACE Corpus: a multilingual corpus of Polarity-annotated textual data from the domains Automotive and CEllphone

Christian Haenig, Andreas Niekler, Carsten Wuensch


Abstract
In this paper, we describe a publicly available multilingual evaluation corpus for phrase-level Sentiment Analysis that can be used to evaluate real world applications in an industrial context. This corpus contains data from English and German Internet forums (1000 posts each) focusing on the automotive domain. The major topic of the corpus is connecting and using cellphones to/in cars. The presented corpus contains different types of annotations: objects (e.g. my car, my new cellphone), features (e.g. address book, sound quality) and phrase-level polarities (e.g. the best possible automobile, big problem). Each of the posts has been annotated by at least four different annotators ― these annotations are retained in their original form. The reliability of the annotations is evaluated by inter-annotator agreement scores. Besides the corpus data and format, we provide comprehensive corpus statistics. This corpus is one of the first lexical resources focusing on real world applications that analyze the voice of the customer which is crucial for various industrial use cases.
Anthology ID:
L14-1240
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2219–2224
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/258_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Christian Haenig, Andreas Niekler, and Carsten Wuensch. 2014. PACE Corpus: a multilingual corpus of Polarity-annotated textual data from the domains Automotive and CEllphone. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 2219–2224, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
PACE Corpus: a multilingual corpus of Polarity-annotated textual data from the domains Automotive and CEllphone (Haenig et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/258_Paper.pdf