Abstract
The best systems at the SemEval-16 and SemEval-17 community question answering shared tasks – a task that amounts to question relevancy ranking – involve complex pipelines and manual feature engineering. Despite this, many of these still fail at beating the IR baseline, i.e., the rankings provided by Google’s search engine. We present a strong baseline for question relevancy ranking by training a simple multi-task feed forward network on a bag of 14 distance measures for the input question pair. This baseline model, which is fast to train and uses only language-independent features, outperforms the best shared task systems on the task of retrieving relevant previously asked questions.- Anthology ID:
- D18-1515
- Volume:
- Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
- Month:
- October-November
- Year:
- 2018
- Address:
- Brussels, Belgium
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4810–4815
- Language:
- URL:
- https://aclanthology.org/D18-1515
- DOI:
- 10.18653/v1/D18-1515
- Cite (ACL):
- Ana Gonzalez, Isabelle Augenstein, and Anders Søgaard. 2018. A strong baseline for question relevancy ranking. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4810–4815, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- A strong baseline for question relevancy ranking (Gonzalez et al., EMNLP 2018)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/D18-1515.pdf