### Verb Prediction

#### Models
1. indePred.py: rnn model
1. WeightDrop.py: rnn weight dropout

#### Utils  
1. process_data.py:  
    get (tensor, label) pair from preprocessed Japanese or get (tensor, pos tag, label) tuple for German saved in .csv files.  
    (Ja.csv has three columns: "preverb", "lemma", "inflected", while de.csv has one more "postag" column)
1. utils.py: helper functions


#### Data
1. preprocessed data available here(Please download and change dataset path in config.py accordingly)  
    1. [Japanese](https://drive.google.com/open?id=176DBx8a4VvJJridF_JvWWghKxhalvOka)  
    1. [German Wortschatz](https://drive.google.com/open?id=1qsZW8PjhsXn-NruzNGnXcd3Dj5__Y0zP)
    1. [German pretrained embeddings](https://drive.google.com/open?id=19e6JsEUBVmSUQ8n0Bj6-4WODUEDEMYgU)
    (The German pretraine embeddings are extracted from the [FastText Embeddins](https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md)  
    regarding the top 100K most frequent words of the german wortschatz corpus, as the initial embedding is too large)

#### Codes
1. config.py: parse parameters using fire(see requirements in environment.yml)
1. indevp.py: train, test with basic rnn model


- *Run*
    1. Install environment and requirements by using environment.yml    
    1. Fire is installed for parsing the parameters, and default parameters are in config.py.    
        1. train: `python indevp.py train`
        1. test: `python indevp.py predict`
        1. change parameters: either in config.py or `python indevp.py train --<para1_name>=<your value> --<para2_name>=<your value> `  
            e.g.`python indevp.py train --num_epochs=20 --data_dir='./data'` 
    1. visualization
        1. plot heatmaps: `python visualize_attention.py -i ../output/ja_word_pos_attn/ -o ../output/ja_word_pos_attn/plots -n 50`
        

#### Results
1. Some recent [results](https://drive.google.com/open?id=1QTsaLEMRVyRQJb4-HQivsP4OIDK_Br--) are available.  
   (Attention maps, precision-recall report for verbs, etc.)
    
#### Parameters
-   general parameters  
    - data_dir = './data/'  : input data_processing path  
    - out_dir = './output/' : output file path  
    - ckp_dir = './checkpoint' : checkpoint file path
- data processing parameters
    - lang = 'de' :  language option( 'de' or 'ja')
    - lang_mode = 'word' : language model option('char' or 'word')
    - loss_func = 'cross_entropy' : loss function option('max_margin' or 'cross_entropy')
    - use_lemma = True : target option, use lemma as target if True, else use the exact verb as target
    - balance_flag = False : if ture, use equal number of samples for each class(verb) in training 
    - source_vocab_size = 50000 : vocab for language model, if character model then automatically use all characters
    - use_pretrained_embed = False : only available for German, if true use FastText pretrained word embeddings
    - split_p = 0 : training data size control(float number between 0 to 1)
    - portion = [0.3, 0.5, 0.7, 0.9, 1.0] : get subsentences of length = p*len(complete_sentence) for p in portion
    - reverse = False  : if true, reverse portion order in training,i.e. train with longer subsentences
    - fix_portion = 0 : train and test on subsentences of length with length = fixed_portion*len(complete_sentence), 
    if zero then use all subsentences

- model set up parameters
    - model = 'indePred'
    - rnn_type = 'GRU' : rnn option('GRU' or 'LSTM'(not tested yet))
    - sampler = 'random': sampler option('weighted' or 'random' or 'None')
    - shuffling = True : if sampler not None, shuffling setting is disabled

    - add_pos_embed = True: if true, add pos tag embedding to embedding vector(only applicable for german data)
    - cnn_rnn = False     : if ture, add cnn before rnn
    - apply_attn = True   : if true, apply attention and obtain weighted context
    - attn_type = 'structured_self_attn' : available attention type('structured_self_attn' or 'self_attn')
    - if attn_type=='structured_self_attn'  
        - da = 128          : linear layer number of units in attention model
        - r = 5             : number of hops for one sentence
        - follow_paper      : if true then follow the attention assignment in [this paper](https://arxiv.org/abs/1703.03130)
        
    - combine_hidden = False : if False, concat hidden, if true combine hidden
    - add_target_embed = False : apply target_embed, use bilinear as out layer, if False, linear

- model hyper-parameters
    - embed_size = 300 : embedding size input vocabulary(default 300 for pretrained german embeddings)
    - hidden_size = 512: rnn hidden size(hidden_size = number_of_training_samples /{alpha * (embed_size + output_size)})
    - batch_size = 256
    - nlayers = 2
    - log_interval = 500 : get log info after log_interval batches
    - eval_interval = 2000 :  eval on the dev set after eval_interval batches
    - lr = 0.0001   : learning rate
    - lr_decay = 0.1 
    - drop_embed_prob = 0.4 :  embedding dropout
    - variational_dropout_prob = 0
    - drop_rnn_prob = 0
    - drop_out_prob = 0 : output layer dropout
    - is_bidirectional = True

    - num_epochs = 10
    - load_ckp = False : if true, continue training using previous checkpoint

    


