## Similarity

#### BOTS

python similarity/calc_bots.py --in_dir llama_preds/old_and_opp_for_mix/ --preds_json data/mix_test_25_3000_with_opp_sim.json --bots_out_json bots_out.json

#### C2V Training

python ./data/cooccurrence_matrix/get_user_counts.py --in_file G:\reddit\reddit\comments\RC_2019-11 --out_dir ./user_counts
python ./data/cooccurrence_matrix/create_cooccurrence_matrix.py --in_dir ./user_counts --out_dir ./coocc_mat_dir
python similarity/community2vec.py

#### LLAMA2 0 shot

python similarity/llm_inference_sim.py --model_id "meta-llama/Llama-2-7b-chat-hf" --input_db /mnt/e/influence_risk_analysis/data/mix_opp_3000.db --out_path f1_sims_l2_7B_0shot.json --exp_name L2_7B_0S --model_family LLAMA2 --num_examples 0

#### LLAMA2 5 shot

python similarity/llm_inference_sim.py --model_id "meta-llama/Llama-2-7b-chat-hf" --input_db /mnt/e/influence_risk_analysis/data/mix_opp_3000.db --out_path f1_sims_l2_7B_5shot.json --exp_name L2_7B_5S --model_family LLAMA2 --num_examples 5

#### Correlation with Subreddit Bias Ratings

Refer - Subreddit Bias Ratings.ipynb

#### Bidirectional Hits@n

Refer - Bidirectional Hits@n.ipynb

#### Emb-PSR

python similarity/emb_psr_single_step_calc.py --data_dir=/mnt/e/reddit/emb_psr/combined/db/ --subs=./16c_input/16cat.json --out_dir=./16c_output_2/ --title_col='title' --model_name all-mpnet-base-v2

Hyperparameter tuning to pick best std - Pick Best Standard Deviation.ipynb

python similarity/calc_hits_at_n.py --cat2sub_json '/mnt/f/Github/influence_risk_analysis/16c_input/cat2sub_16c.json' --similarity_json '/mnt/f/Github/influence_risk_analysis/emb_psr_16cat_sims.json'

#### C2V W2V Hits@n (comparison against Emb-PSR)

python similarity/calc_c2v_w2v_cos_sim.py --w2v_in_path /mnt/c/Users/anon/Downloads/c2v_out_all_years/c2v_out_all_years/best_model/word2vec.pickle --out_path ./c2v_w2v_sim_16cat.json --subs_json ./16c_input/16cat.json

python similarity/calc_hits_at_n.py --cat2sub_json '/mnt/f/Github/influence_risk_analysis/16c_input/cat2sub_16c.json' --similarity_json '/mnt/f/Github/influence_risk_analysis/c2v_w2v_sim_16cat.json'

