Corpus of Comparisons in Product Reviews (v1.0)

The dataset contains two files: "camera.comparisons" and "cell.comparisons" for the Digital Cameras and Cell Phones domains respectively.
The sentence was taken from Amazon and Epinions customer reviews.

Annotations are sentence based, every line corresponds to a sentence.

The format of a line is as follows:
<SENTENCE_ID> "\t" <SENTENCE> "\t" <LABEL>

<SENTENCE_ID> is a uniqe identifier of a sentence in a file.

<LABEL> is either "+" or "-". 
"+" means that the marked pair of entities within the sentence is in comparative relation, "-" - otherwise.

<SENTENCE> is a space-separated tokenized sentence with a marked pair of entities. We use "<E>" .. "</E>" to mark an entity.

EXAMPLES:

1)
AZ_797	For Nikon APS-C system , <E> D300s </E> is much better than <E> D7000 </E> , especially considering the price difference ( only $ 249 difference ) .	+

2)
90696-114	Not only that , but the <E> 3500 </E> costs LESS than the <E> StarTAC </E> , which is priced at $ 229.99 !	+

3)
87569-4	Apparently Motorola has fixed the v60 antenna problem with the <E> v60s </E> and <E> v60p </E> models .	-


REFERENCE:

Maksim Tkachenko and Hady W. Lauw. A Convolution Kernel Approach to Identifying Comparisons in Text, ACL 2015.

The bibtex format is

@inproceedings{tkachenko-lauw:2015,
  author    = {Maksim Tkachenko and Hady W. Lauw},
  title     = {A Convolution Kernel Approach to Identifying Comparisons in Text},
  booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics},
  year      = {2015},
  publisher = {Association for Computational Linguistics},
}
