SP-10K: A Large-scale Evaluation Set for Selectional Preference Acquisition

Hongming Zhang, Hantian Ding, Yangqiu Song


Abstract
Selectional Preference (SP) is a commonly observed language phenomenon and proved to be useful in many natural language processing tasks. To provide a better evaluation method for SP models, we introduce SP-10K, a large-scale evaluation set that provides human ratings for the plausibility of 10,000 SP pairs over five SP relations, covering 2,500 most frequent verbs, nouns, and adjectives in American English. Three representative SP acquisition methods based on pseudo-disambiguation are evaluated with SP-10K. To demonstrate the importance of our dataset, we investigate the relationship between SP-10K and the commonsense knowledge in ConceptNet5 and show the potential of using SP to represent the commonsense knowledge. We also use the Winograd Schema Challenge to prove that the proposed new SP relations are essential for the hard pronoun coreference resolution problem.
Anthology ID:
P19-1071
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Anna Korhonen, David Traum, Lluís Màrquez
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
722–731
Language:
URL:
https://aclanthology.org/P19-1071
DOI:
10.18653/v1/P19-1071
Bibkey:
Cite (ACL):
Hongming Zhang, Hantian Ding, and Yangqiu Song. 2019. SP-10K: A Large-scale Evaluation Set for Selectional Preference Acquisition. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 722–731, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
SP-10K: A Large-scale Evaluation Set for Selectional Preference Acquisition (Zhang et al., ACL 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-dup-bibkey/P19-1071.pdf
Code
 HKUST-KnowComp/SP-10K
Data
SP-10KNew York Times Annotated CorpusWSC