This is the supplementary material for our paper.

The softmare loads our C2W model, which builds word representations from characters. Thus, the model is only 18M (file: rep.gz). It was trained over 10M tokens in wikipedia using a language model. 

To run the softmare, simply run:

sh run.sh

(you will need Java 1.8 installed)

The program simply loads the C2W model, builds embeddings for a given list of words (file: vocab), and then waits for input:
typing the following:

sim Germany

will produce a list of top 10 words in the vocab file closest to Germany. You can write any word you wish, even if it is not in the vocabulary, such as "Noahshire", but keep in mind that casing matters, so if you write germany it will not recognize it as a country.

As for the vocab file, this is a list of the top 10000 words in the training data in Penn Treebank. We kept the list small, so that the code can be run with limited RAM memory, so results from here may differ from the paper.
