Abstract
Learning to hash via generative model has become a powerful paradigm for fast similarity search in documents retrieval. To get binary representation (i.e., hash codes), the discrete distribution prior (i.e., Bernoulli Distribution) is applied to train the variational autoencoder (VAE). However, the discrete stochastic layer is usually incompatible with the backpropagation in the training stage, and thus causes a gradient flow problem because of non-differentiable operators. The reparameterization trick of sampling from a discrete distribution usually inc non-differentiable operators. In this paper, we propose a method, Doc2hash, that solves the gradient flow problem of the discrete stochastic layer by using continuous relaxation on priors, and trains the generative model in an end-to-end manner to generate hash codes. In qualitative and quantitative experiments, we show the proposed model outperforms other state-of-art methods.- Anthology ID:
- N19-1232
- Volume:
- Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, Minnesota
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2235–2240
- Language:
- URL:
- https://aclanthology.org/N19-1232
- DOI:
- 10.18653/v1/N19-1232
- Cite (ACL):
- Yifei Zhang and Hao Zhu. 2019. Doc2hash: Learning Discrete Latent variables for Documents Retrieval. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2235–2240, Minneapolis, Minnesota. Association for Computational Linguistics.
- Cite (Informal):
- Doc2hash: Learning Discrete Latent variables for Documents Retrieval (Zhang & Zhu, NAACL 2019)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/N19-1232.pdf
- Code
- yifeiacc/doc2hash
- Data
- RCV1