# Sentence Classification in TensorFlow

This project is roughly an exact TensorFlow implementation of Yoon Kim's paper [Convolutional Neural Networks for Sentence Classification](https://arxiv.org/abs/1408.5882) (EMNLP 2014), along with an option to use the [ELMo word embeddings](http://allennlp.org/elmo). His original Theano code can be found [here](https://github.com/yoonkim/CNN_sentence). Alternate to this, you can look at Denny Britz's TensorFlow implementation, [here](https://github.com/dennybritz/cnn-text-classification-tf).

## Setup

1. Download Google's `word2vec` embeddings and place them inside `data/w2v/`. This is a large file (~ `3.5G`). You may `git clone` [this](https://github.com/mmihaltz/word2vec-GoogleNews-vectors).
2. Download the folder [here](https://drive.google.com/open?id=1HlD3hWdGRLpsroOaE2eaj447LirNphxa) and place it in the `data/` directory.
2. Ensure you have a working `tensorflow` or `tensorflow-gpu` (version 1.7+ for ELMo). Additional dependencies include `yaml`, `bunch` and `cPickle`, `tensorflow-hub`.
3. Pre-process the data by using,
```
cd data
chmod +x process_sst2_sentence.sh
./process-sst2-sentence.sh
```
4. Run `python main.py --no-cache --seed 0` to train the model on sentence level labels.

## Model Configuration
The model hyperparameters are present in (`config/nonstatic.yml`). All hyperparameters (except `batch_size`) are identical to those reported in the paper. You may change the training directory via the `--job_id` parameter, and the random seed using `--seed`. Look at `config/arguments.py` for more details.

To swith ELMo, use the last line in the configuration file. `elmo: true` or `elmo: false`. Kindly ignore the hyperparameters under the titles `# mixer BiLSTM`, `# gradient configuration`, `# iterative configuration` and `## Interpolated Loss Configuration`. We will clean this part before open sourcing the code.
