Abstract
Unsupervised document representation learning is an important task providing pre-trained features for NLP applications. Unlike most previous work which learn the embedding based on self-prediction of the surface of text, we explicitly exploit the inter-document information and directly model the relations of documents in embedding space with a discriminative network and a novel objective. Extensive experiments on both small and large public datasets show the competitiveness of the proposed method. In evaluations on standard document classification, our model has errors that are 5 to 13% lower than state-of-the-art unsupervised embedding models. The reduction in error is even more pronounced in scarce label setting.- Anthology ID:
- N19-1255
- Volume:
- Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, Minnesota
- Editors:
- Jill Burstein, Christy Doran, Thamar Solorio
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2465–2474
- Language:
- URL:
- https://aclanthology.org/N19-1255
- DOI:
- 10.18653/v1/N19-1255
- Cite (ACL):
- Hong-You Chen, Chin-Hua Hu, Leila Wehbe, and Shou-De Lin. 2019. Self-Discriminative Learning for Unsupervised Document Embedding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2465–2474, Minneapolis, Minnesota. Association for Computational Linguistics.
- Cite (Informal):
- Self-Discriminative Learning for Unsupervised Document Embedding (Chen et al., NAACL 2019)
- PDF:
- https://preview.aclanthology.org/naacl24-info/N19-1255.pdf
- Data
- IMDb Movie Reviews