Abstract
Canonical correlation analysis (CCA) is a method for reducing the dimension of data represented using two views. It has been previously used to derive word embeddings, where one view indicates a word, and the other view indicates its context. We describe a way to incorporate prior knowledge into CCA, give a theoretical justification for it, and test it by deriving word embeddings and evaluating them on a myriad of datasets.- Anthology ID:
- Q16-1030
- Volume:
- Transactions of the Association for Computational Linguistics, Volume 4
- Month:
- Year:
- 2016
- Address:
- Cambridge, MA
- Venue:
- TACL
- SIG:
- Publisher:
- MIT Press
- Note:
- Pages:
- 417–430
- Language:
- URL:
- https://aclanthology.org/Q16-1030
- DOI:
- 10.1162/tacl_a_00108
- Cite (ACL):
- Dominique Osborne, Shashi Narayan, and Shay B. Cohen. 2016. Encoding Prior Knowledge with Eigenword Embeddings. Transactions of the Association for Computational Linguistics, 4:417–430.
- Cite (Informal):
- Encoding Prior Knowledge with Eigenword Embeddings (Osborne et al., TACL 2016)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/Q16-1030.pdf
- Data
- FrameNet