Abstract
Non-contiguous word sequences are widely known to be important in modelling natural language. However they not explicitly encoded in common text representations. In this work we propose a model for text processing using string kernels, capable of flexibly representing non-contiguous sequences. Specifically, we derive a vectorised version of the string kernel algorithm and their gradients, allowing efficient hyperparameter optimisation as part of a Gaussian Process framework. Experiments on synthetic data and text regression for emotion analysis show the promise of this technique.- Anthology ID:
- I17-2012
- Volume:
- Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
- Month:
- November
- Year:
- 2017
- Address:
- Taipei, Taiwan
- Venue:
- IJCNLP
- SIG:
- Publisher:
- Asian Federation of Natural Language Processing
- Note:
- Pages:
- 67–73
- Language:
- URL:
- https://aclanthology.org/I17-2012
- DOI:
- Cite (ACL):
- Daniel Beck and Trevor Cohn. 2017. Learning Kernels over Strings using Gaussian Processes. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 67–73, Taipei, Taiwan. Asian Federation of Natural Language Processing.
- Cite (Informal):
- Learning Kernels over Strings using Gaussian Processes (Beck & Cohn, IJCNLP 2017)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/I17-2012.pdf