Abstract
Contents analisys from text data requires semantic representations that are difficult to obtain automatically, as they may require large handcrafted knowledge bases or manually annotated examples. Unsupervised autonomous methods for generating semantic representations are of greatest interest in face of huge volumes of text to be exploited in all kinds of applications. In this work we describe the generation and validation of semantic representations in the vector space paradigm for Spanish. The method used is GloVe (Pennington, 2014), one of the best performing reported methods , and vectors were trained over Spanish Wikipedia. The learned vectors evaluation is done in terms of word analogy and similarity tasks (Pennington, 2014; Baroni, 2014; Mikolov, 2013a). The vector set and a Spanish version for some widely used semantic relatedness tests are made publicly available.- Anthology ID:
- L16-1584
- Volume:
- Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
- Month:
- May
- Year:
- 2016
- Address:
- Portorož, Slovenia
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 3681–3685
- Language:
- URL:
- https://aclanthology.org/L16-1584
- DOI:
- Cite (ACL):
- Mathias Etcheverry and Dina Wonsever. 2016. Spanish Word Vectors from Wikipedia. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3681–3685, Portorož, Slovenia. European Language Resources Association (ELRA).
- Cite (Informal):
- Spanish Word Vectors from Wikipedia (Etcheverry & Wonsever, LREC 2016)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/L16-1584.pdf