SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach

Michael Petrochuk, Luke Zettlemoyer


Abstract
The SimpleQuestions dataset is one of the most commonly used benchmarks for studying single-relation factoid questions. In this paper, we present new evidence that this benchmark can be nearly solved by standard methods. First, we show that ambiguity in the data bounds performance at 83.4%; many questions have more than one equally plausible interpretation. Second, we introduce a baseline that sets a new state-of-the-art performance level at 78.1% accuracy, despite using standard methods. Finally, we report an empirical analysis showing that the upperbound is loose; roughly a quarter of the remaining errors are also not resolvable from the linguistic signal. Together, these results suggest that the SimpleQuestions dataset is nearly solved.
Anthology ID:
D18-1051
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Editors:
Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
554–558
Language:
URL:
https://aclanthology.org/D18-1051
DOI:
10.18653/v1/D18-1051
Bibkey:
Cite (ACL):
Michael Petrochuk and Luke Zettlemoyer. 2018. SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 554–558, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach (Petrochuk & Zettlemoyer, EMNLP 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/ml4al-ingestion/D18-1051.pdf
Video:
 https://preview.aclanthology.org/ml4al-ingestion/D18-1051.mp4
Data
SimpleQuestions