Embedding Methods for Natural Language Processing

Antoine Bordes, Jason Weston

[How to correct problems with metadata yourself]


Abstract
Embedding-based models are popular tools in Natural Language Processing these days. In this tutorial, our goal is to provide an overview of the main advances in this domain. These methods learn latent representations of words, as well as database entries that can then be used to do semantic search, automatic knowledge base construction, natural language understanding, etc. Our current plan is to split the tutorial into 2 sessions of 90 minutes, with a 30 minutes coffee break in the middle, so that we can cover in a first session the basics of learning embeddings and advanced models in the second session. This is detailed in the following.Part 1: Unsupervised and Supervised EmbeddingsWe introduce models that embed tokens (words, database entries) by representing them as low dimensional embedding vectors. Unsupervised and supervised methods will be discussed, including SVD, Word2Vec, Paragraph Vectors, SSI, Wsabie and others. A comparison between methods will be made in terms of applicability, type of loss function (ranking loss, reconstruction loss, classification loss), regularization, etc. The use of these models in several NLP tasks will be discussed, including question answering, frame identification, knowledge extraction and document retrieval.Part 2: Embeddings for Multi-relational DataThis second part will focus mostly on the construction of embeddings for multi-relational data, that is when tokens can be interconnected in different ways in the data such as in knowledge bases for instance. Several methods based on tensor factorization, collective matrix factorization, stochastic block models or energy-based learning will be presented. The task of link prediction in a knowledge base will be used as an application example. Multiple empirical results on the use of embedding models to align textual information to knowledge bases will also be presented, together with some demos if time permits.
Anthology ID:
D14-2006
Volume:
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts
Month:
October
Year:
2014
Address:
Doha, Qatar
Editors:
Lucia Specia, Xavier Carreras
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
Language:
URL:
https://aclanthology.org/D14-2006
DOI:
Bibkey:
Cite (ACL):
Antoine Bordes and Jason Weston. 2014. Embedding Methods for Natural Language Processing. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, Doha, Qatar. Association for Computational Linguistics.
Cite (Informal):
Embedding Methods for Natural Language Processing (Bordes & Weston, EMNLP 2014)
Copy Citation: