# Book dataset
#### Path of the following dataset : [LREC-COLING2024_Dataset](https://drive.google.com/drive/u/1/folders/11dBJ63S80xHNCcAhMT498y-fZyZTDuqx)

#### This dataset is originally from [Book-Crossing](https://grouplens.org/datasets/book-crossing/). We contains part of original data which we utilized for our prepocessing. 

- `BX-Book-Ratings.csv` : Original user rated data is contained in the file (User-ID, ISBN, Book-Rating).
- `BX-Books.csv` : Book information is contained in the file (ISBN, Book-Title, Book-Author, Year-Of-Publication, Publisher).

#### [RippleNet](https://github.com/hwwang55/RippleNet/tree/master/data/book) author built a knowledge graph (KG) through Microsoft Santori. 

- `item_index2entity_id.txt` : Mapping table of Book-Crossing's ISBN to indexed number. 
- `kg.txt` : Triplet of KG (book-relation-entity).
- `kg_relation.txt` : KG relations are presented in text.
- `ratings_final.txt` : User rated data based on RippleNet author's assumption.

#### Positive/Negative pairs for Contrastive Loss that we proposed.

- `pos_neg_pairs_from_genres.txt` : Used the genre information solely.
- `pos_neg_pairs_from_title_genres.txt` : Used both the genre and title information.

#### Semantic text embedding.

- `synop_emb_book.npy` : Embedding of human generated description (from [Goodreads](https://www.goodreads.com/) and [GoogleBooks](https://books.google.com/)), Encoded by BERT_base. 
- `synop_emb_book_llama.npy` : Embedding of generated description using LLaMA, Encoded by BERT_base. 

** Due to the fact that the descriptions are obtained through crawling or generated using a personal API for research purposes, we are unable to make them publicly available.
