Dataset for a Neural Natural Language Interface for Databases (NNLIDB)

Florin Brad, Radu Cristian Alexandru Iacob, Ionel Alexandru Hosu, Traian Rebedea

[How to correct problems with metadata yourself]


Abstract
Progress in natural language interfaces to databases (NLIDB) has been slow mainly due to linguistic issues (such as language ambiguity) and domain portability. Moreover, the lack of a large corpus to be used as a standard benchmark has made data-driven approaches difficult to develop and compare. In this paper, we revisit the problem of NLIDBs and recast it as a sequence translation problem. To this end, we introduce a large dataset extracted from the Stack Exchange Data Explorer website, which can be used for training neural natural language interfaces for databases. We also report encouraging baseline results on a smaller manually annotated test corpus, obtained using an attention-based sequence-to-sequence neural network.
Anthology ID:
I17-1091
Volume:
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
November
Year:
2017
Address:
Taipei, Taiwan
Editors:
Greg Kondrak, Taro Watanabe
Venue:
IJCNLP
SIG:
Publisher:
Asian Federation of Natural Language Processing
Note:
Pages:
906–914
Language:
URL:
https://aclanthology.org/I17-1091
DOI:
Bibkey:
Cite (ACL):
Florin Brad, Radu Cristian Alexandru Iacob, Ionel Alexandru Hosu, and Traian Rebedea. 2017. Dataset for a Neural Natural Language Interface for Databases (NNLIDB). In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 906–914, Taipei, Taiwan. Asian Federation of Natural Language Processing.
Cite (Informal):
Dataset for a Neural Natural Language Interface for Databases (NNLIDB) (Brad et al., IJCNLP 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/teach-a-man-to-fish/I17-1091.pdf
Dataset:
 I17-1091.Datasets.zip