Abstract
The recent tremendous success of unsupervised word embeddings in a multitude of applications raises the obvious question if similar methods could be derived to improve embeddings (i.e. semantic representations) of word sequences as well. We present a simple but efficient unsupervised objective to train distributed representations of sentences. Our method outperforms the state-of-the-art unsupervised models on most benchmark tasks, highlighting the robustness of the produced general-purpose sentence embeddings.- Anthology ID:
- N18-1049
- Volume:
- Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
- Month:
- June
- Year:
- 2018
- Address:
- New Orleans, Louisiana
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 528–540
- Language:
- URL:
- https://aclanthology.org/N18-1049
- DOI:
- 10.18653/v1/N18-1049
- Cite (ACL):
- Matteo Pagliardini, Prakhar Gupta, and Martin Jaggi. 2018. Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 528–540, New Orleans, Louisiana. Association for Computational Linguistics.
- Cite (Informal):
- Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features (Pagliardini et al., NAACL 2018)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/N18-1049.pdf
- Code
- epfml/sent2vec + additional community code
- Data
- MPQA Opinion Corpus, SICK, STS 2014