# Movie dataset
#### Path of the following dataset : [LREC-COLING2024_Dataset](https://drive.google.com/drive/u/1/folders/11dBJ63S80xHNCcAhMT498y-fZyZTDuqx)

#### This dataset is originally from [MovieLens](http://movielens.org) (ml-20m), a movie recommendation service. We contains part of original data which we utilized for our prepocessing. 

- `links.csv` : Identifiers that can be used to link to other sources of movie data (movieId, [imdbId](http://www.imdb.com), [tmdbId](https://www.themoviedb.org)).
- `movies.csv` : Movie information is contained in the file (movieId, title, genres). Movie titles are entered manually or imported from [tmdbId](https://www.themoviedb.org), and include the year of release in parentheses. Errors and inconsistencies may exist in these titles.

#### [KGCN](https://github.com/hwwang55/KGCN/tree/master) author built a knowledge graph (KG) through Microsoft Santori. 

- `item_index2entity_id.txt` : Mapping table of MovieLens's movieId to indexed number. 
- `kg.txt` : Triplet of KG (movie-relation-entity).
- `kg_relation.txt` : KG relations are presented in text.
- `ratings_final.txt` : User rated data based on KGCN author's assumption.

#### Positive/Negative pairs for Contrastive Loss that we proposed.

- `pos_neg_pairs_from_genres.txt` : Used the genre information solely.
- `pos_neg_pairs_from_title_genres.txt` : Used both the genre and title information.

#### Semantic text embedding.

- `synop_emb_movie.npy` : Embedding of human generated synopsis (from [tmdbId](https://www.themoviedb.org)), Encoded by BERT_base. 
- `synop_emb_movie_llama.npy` : Embedding of generated synopsis using LLaMA, Encoded by BERT_base. 

** Due to the fact that the synopses are obtained through crawling or generated using a personal API for research purposes, we are unable to make them publicly available.