Abstract
Understanding search queries is a hard problem as it involves dealing with “word salad” text ubiquitously issued by users. However, if a query resembles a well-formed question, a natural language processing pipeline is able to perform more accurate interpretation, thus reducing downstream compounding errors. Hence, identifying whether or not a query is well formed can enhance query understanding. Here, we introduce a new task of identifying a well-formed natural language question. We construct and release a dataset of 25,100 publicly available questions classified into well-formed and non-wellformed categories and report an accuracy of 70.7% on the test set. We also show that our classifier can be used to improve the performance of neural sequence-to-sequence models for generating questions for reading comprehension.- Anthology ID:
 - D18-1091
 - Volume:
 - Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
 - Month:
 - October-November
 - Year:
 - 2018
 - Address:
 - Brussels, Belgium
 - Editors:
 - Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
 - Venue:
 - EMNLP
 - SIG:
 - SIGDAT
 - Publisher:
 - Association for Computational Linguistics
 - Note:
 - Pages:
 - 798–803
 - Language:
 - URL:
 - https://aclanthology.org/D18-1091
 - DOI:
 - 10.18653/v1/D18-1091
 - Cite (ACL):
 - Manaal Faruqui and Dipanjan Das. 2018. Identifying Well-formed Natural Language Questions. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 798–803, Brussels, Belgium. Association for Computational Linguistics.
 - Cite (Informal):
 - Identifying Well-formed Natural Language Questions (Faruqui & Das, EMNLP 2018)
 - PDF:
 - https://preview.aclanthology.org/ingest-acl-2023-videos/D18-1091.pdf
 - Data
 - Paralex