Abstract
In recent past, NLP as a field has seen tremendous utility of distributional word vector representations as features in downstream tasks. The fact that these word vectors can be trained on unlabeled monolingual corpora of a language makes them an inexpensive resource in NLP. With the increasing use of monolingual word vectors, there is a need for word vectors that can be used as efficiently across multiple languages as monolingually. Therefore, learning bilingual and multilingual word embeddings/vectors is currently an important research topic. These vectors offer an elegant and language-pair independent way to represent content across different languages.This tutorial aims to bring NLP researchers up to speed with the current techniques in cross-lingual word representation learning. We will first discuss how to induce cross-lingual word representations (covering both bilingual and multilingual ones) from various data types and resources (e.g., parallel data, comparable data, non-aligned monolingual data in different languages, dictionaries and theasuri, or, even, images, eye-tracking data). We will then discuss how to evaluate such representations, intrinsically and extrinsically. We will introduce researchers to state-of-the-art methods for constructing cross-lingual word representations and discuss their applicability in a broad range of downstream NLP applications.We will deliver a detailed survey of the current methods, discuss best training and evaluation practices and use-cases, and provide links to publicly available implementations, datasets, and pre-trained models.- Anthology ID:
- D17-3007
- Volume:
- Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts
- Month:
- September
- Year:
- 2017
- Address:
- Copenhagen, Denmark
- Editors:
- Alexandra Birch, Nathan Schneider
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/D17-3007/
- DOI:
- Cite (ACL):
- Manaal Faruqui, Anders Søgaard, and Ivan Vulić. 2017. Cross-Lingual Word Representations: Induction and Evaluation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, Copenhagen, Denmark. Association for Computational Linguistics.
- Cite (Informal):
- Cross-Lingual Word Representations: Induction and Evaluation (Faruqui et al., EMNLP 2017)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/D17-3007.pdf