# TSI Framework
We provide our implementation of TSI framework. We will release our implementation via the authors' repository for reproducibilty.

# Datasets
In this work, we utilize two public datasets: CSFCube and DORIS-MAE.
The necessary files are included in the provided Dataset folder.
You can also download the files via the below links:
- CSFCube: https://github.com/iesl/CSFCube
- DORIS-MAE: https://github.com/kwang927/Doris-Mae-Dataset

# Taxonomy
In this work, we utilize field of study taxonomy from Microsoft academic.
The taxonomy files are included in the provided Dataset folder.
- mag_filed_of_studies.taxo: node connection relations
- mag_filed_of_stuides.terms: topic name of each node

# Training data
The training data used in our expriments are also provided. You can find them in Dataset folder.

# Prompt for core topic selection
Our prompt used to select core topics from the candidate set is provided in the below file:
- topic_selection_prompt.txt

# Code details
Put the dataset folder into the TSI folder.
1. preprocess step: generate datasets, phrases, taxonomy files.
    a. preprocess.py
2. index construction step: construct semantic index files.
    a. core_topic_identification.py
    b. indicative_phrase_extraction.py
3. index-grounded fine-tuning step: train TSI module.
    a. lexical_negative_generation.py
    b. Indexing_network_warmup.py
    c. Index_grounded_fine_tuning.py
- The backbone embedding generation (embedding_generation.py) and fine tuning (FFT.py) codes can be also found in Utils folder.