Tomáš Mikolov

Also published as: Tomas Mikolov

2018

Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion
Armand Joulin | Piotr Bojanowski | Tomas Mikolov | Hervé Jégou | Edouard Grave
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Continuous word representations learned separately on distinct languages can be aligned so that their words become comparable in a common space. Existing works typically solve a quadratic problem to learn a orthogonal matrix aligning a bilingual lexicon, and use a retrieval criterion for inference. In this paper, we propose an unified formulation that directly optimizes a retrieval criterion in an end-to-end fashion. Our experiments on standard benchmarks show that our approach outperforms the state of the art on word translation, with the biggest improvements observed for distant language pairs such as English-Chinese.

pdf bib

Advances in Pre-Training Distributed Word Representations
Tomas Mikolov | Edouard Grave | Piotr Bojanowski | Christian Puhrsch | Armand Joulin
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib

Learning Word Vectors for 157 Languages
Edouard Grave | Piotr Bojanowski | Prakhar Gupta | Armand Joulin | Tomas Mikolov
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib abs

Bag of Tricks for Efficient Text Classification
Armand Joulin | Edouard Grave | Piotr Bojanowski | Tomas Mikolov
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

This paper explores a simple and efficient baseline for text classification. Our experiments show that our fast text classifier fastText is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation. We can train fastText on more than one billion words in less than ten minutes using a standard multicore CPU, and classify half a million sentences among 312K classes in less than a minute.

pdf bib abs

Enriching Word Vectors with Subword Information
Piotr Bojanowski | Edouard Grave | Armand Joulin | Tomas Mikolov
Transactions of the Association for Computational Linguistics, Volume 5

Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character n-grams. A vector representation is associated to each character n-gram; words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpora quickly and allows us to compute word representations for words that did not appear in the training data. We evaluate our word representations on nine different languages, both on word similarity and analogy tasks. By comparing to recently proposed morphological word representations, we show that our vectors achieve state-of-the-art performance on these tasks.

Tomáš Mikolov

2018

2017

2014

2013

2011

Co-authors

Venues