English WordNet Random Walk Pseudo-Corpora
Filip Klubička, Alfredo Maldonado, Abhijit Mahalunkar, John Kelleher
Abstract
This is a resource description paper that describes the creation and properties of a set of pseudo-corpora generated artificially from a random walk over the English WordNet taxonomy. Our WordNet taxonomic random walk implementation allows the exploration of different random walk hyperparameters and the generation of a variety of different pseudo-corpora. We find that different combinations of parameters result in varying statistical properties of the generated pseudo-corpora. We have published a total of 81 pseudo-corpora that we have used in our previous research, but have not exhausted all possible combinations of hyperparameters, which is why we have also published a codebase that allows the generation of additional WordNet taxonomic pseudo-corpora as needed. Ultimately, such pseudo-corpora can be used to train taxonomic word embeddings, as a way of transferring taxonomic knowledge into a word embedding space.- Anthology ID:
- 2020.lrec-1.602
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 4893–4902
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.602
- DOI:
- Cite (ACL):
- Filip Klubička, Alfredo Maldonado, Abhijit Mahalunkar, and John Kelleher. 2020. English WordNet Random Walk Pseudo-Corpora. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4893–4902, Marseille, France. European Language Resources Association.
- Cite (Informal):
- English WordNet Random Walk Pseudo-Corpora (Klubička et al., LREC 2020)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2020.lrec-1.602.pdf