Building an HPC blast pipeline
Login to the hpc (at the University of Arizona) with your user name Create a directory for the pipeline Go to the hpc-blast directory and create sub directories to hold std-err, std-out, data, blastdb, blast-results and scripts.
Go to the blastdb directory and create the blast database.
Copy the input files over from iPlant.
We will be using the day/night transcriptomes we discussed last week.
You will use two of the Perl scripts from the homework in the hpc pipeline.
Put these into the scripts directory.
config.sh: Now we need to create the config.sh script.
This script will contain all of the environmental variables that the user will be able to modify for running blast.
The idea of this script is to give the user one place to make any changes they need to run their particular blast job.
The user should not have to go in and change any of the scripts in the 'scripts' directory.
Instead, we will capture all of these variable here.
* note the the config.sh script is in the hpc-blast directory (in the first part of the pipeline), where it is easy for the user to modify it.
blast-pipeline.sh:Now we will create the "driver script", the script that is used to run each part of the pipeline.
run-fasta-splitter.sh.
Create a shell script that runs the fasta splitter perl code.
run-blast.sh: Create a shell script that runs blast on a compute node.
run-parse-blast.sh:Create a shell script to parse the blast output that runs on a compute node on the cluster.
04-blast-search.pl: modify the blast search script.Update the e-value to be less stringent.Modify the Perl script 04-blast-search.pl to include additional information about the hit.
Make all of the scripts executable and then run the pipeline.If you don't get any errors, the pipeline will take <1 day to run.
Check to see if the pipeline is running Once the pipeline is finished running (and you don't see the jobs listed under qstat), check for std-errors and std-out.Note you can check while the pipeline is running too.
Now you try.See if you can update the pipeline to add one final step that puts all of the parsed blast output into a single file for each of the original fasta files.Note that you can comment out the parts of the blast-pipeline.sh that you have already run in the steps above, to test your code.
Otherwise, these will run again.Also, before you run this final step, make sure the blast parsing is complete.Can this run on the head node, or should we run this on a compute node?The complete pipeline can be found in github here.
Commit your code to your own github abe487 repository using the directory:hpc-blast
