MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP
Alexandre Bérard, Christophe Servan, Olivier Pietquin, Laurent Besacier
Abstract
We present MultiVec, a new toolkit for computing continuous representations for text at different granularity levels (word-level or sequences of words). MultiVec includes word2vec’s features, paragraph vector (batch and online) and bivec for bilingual distributed representations. MultiVec also includes different distance measures between words and sequences of words. The toolkit is written in C++ and is aimed at being fast (in the same order of magnitude as word2vec), easy to use, and easy to extend. It has been evaluated on several NLP tasks: the analogical reasoning task, sentiment analysis, and crosslingual document classification.- Anthology ID:
- L16-1662
- Volume:
- Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
- Month:
- May
- Year:
- 2016
- Address:
- Portorož, Slovenia
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 4188–4192
- Language:
- URL:
- https://aclanthology.org/L16-1662
- DOI:
- Cite (ACL):
- Alexandre Bérard, Christophe Servan, Olivier Pietquin, and Laurent Besacier. 2016. MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4188–4192, Portorož, Slovenia. European Language Resources Association (ELRA).
- Cite (Informal):
- MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP (Bérard et al., LREC 2016)
- PDF:
- https://preview.aclanthology.org/auto-file-uploads/L16-1662.pdf
- Code
- eske/multivec