# Additional training details:

All models train on a single NVIDIA GeForce GTX 1080 Ti. The text-only LM trains in approximately 2h, and image recognition models train in abour 3hrs. Other text-only models (GloVe, word2vec, and FastText) train in minutes. The language models are trained with negative log-likelihood loss, and we do not propagate losses from targets which represent broken signs (signs labeled "..." or "X").

# Explanation of included figures:

- CG Hierarchy.pdf:
The CG containment hierarchy visualized as a lattice with directed edges from outer signs to the inner signs they can contain. Thicker edges represent CGs which are more strongly compositional. The label at the end of each edge gives the cosine similarity between that CG and the sum of its parts. Nodes are colored according to modularity class: nodes are most strongly connected to other nodes of the same color.

- Embedding Spaces/glove.64.svg
Full t-SNE decomposition of the glove.64 embeddings. The inset from Figure 3 in the paper is outlined in red.

- Embedding Spaces/lm.image.64.svg
Full t-SNE decomposition of the lm.image.64 embeddings. The inset from Figure 3 is outlined in red.

- Embedding Spaces/image_recognition.64.svg
Full t-SNE decomposition of the image_recognition.64 embeddings. The inset from Figure 3 is outlined in red.
