-
Notifications
You must be signed in to change notification settings - Fork 9
Read_Mapping
The read_mapping_start.sh script's main function is to start a series of QSub job submissions to the Portable Batch System job scheduler for read mapping using the Burrows Wheeler Aligner (BWA). Despite being classified as a shell script, read_mapping_start.sh requires a list of samples to run. It can also index a Fasta file using BWA. To run read_mapping_start.sh, you would type:
./read_mapping_start.shThis will display a usage message describing the arguments for read_mapping_start.sh
There are two subroutines for read_mapping_start.sh: map and index. As such, the argument lists have been broke up into these two subroutines.
| Mapping Argument | Function |
|---|---|
map |
Start the read mapping process using BWA |
Scratch |
A directory to put the finished SAM files from BWA |
Reference Genome |
The genome to base the read mapping off of. The genome must be indexed before read mapping can happen |
Sample Info |
A list of samples for read_mapping_start.sh to work with |
Project |
The name of the project or capture facility, used for the Read Group header |
Platform |
The platform used for sequencing, used for the Read Group header |
Email |
An email address for the QSub scheduler to notify you of starts, ends, and abortions for each read mapping |
| Indexing Argument | Function |
|---|---|
index |
Start the indexing process using BWA |
Reference Genome |
The genome to be indexed, must be in Fasta format |
Email |
An email address for the QSub scheduler to notify you of starts, ends, and abortions for indexing |
All arguments must be passed in the correct order (top to bottom for each list) for read_mapping_start.sh to work. For example, say this script is in the directory ~/sequence_handling; to index a genome called 'reference_genome.fasta' in the directory ~/genomes and have the QSub scheduler notify user@github.com, we would type:
./read_mapping_start.sh index ~/genomes/reference_genome.fasta user@github.comTo map a list of Illumina-sequenced samples in the file 'trimmed_samples.txt' for our 'Genetics' project, stored in the directory ~/trimmed_samples, have the SAM files go to the directory ~/mapped_SAM, use the reference genome 'reference_genome.fasta' stored in the directory ~/genomes, and email user@github.com any notifications for the QSub scheduler, we would type:
./read_mapping_start.sh map ~/mapped_SAM ~/genome/refernce_genome.fasta ~/trimmed_samples/trimmed_samples.txt Genetics Illumina user@github.comPlease note: the script is set up to read forward reads as having the extension '_R1_trimmed.fq.gz' an reverse reads as having the extension '_R2_trimmed.fq.gz', if your files do NOT have this extension, please edit the script on lines 82 and 83 for forward and reverse naming extensions using your favorite text editor
The index subroutine for read_mapping_start.sh generates an index file for a reference genome in the same directory as the reference genome. Please make sure you have write permissions for said directory.
The map subroutine generates aligned SAM files for each sample. These SAM files have the '@SQ', '@RG', and '@PG' headers included in them. The '@HD' header is not generated from this process.
A list of files is not generated from read_mapping_start.sh. To do create one, please use sample_list_generator.sh. A list of SAM files is required for the SAM_Processing scripts
read_mapping_start.sh depends on BWA and the Portable Batch System to run. If you want to use a differnt job scheduler or read mapper, you will need to modify this script extensively.
Next: SAM_Processing
- Getting Started
- Recommended Workflow
- Configuration
- Dependencies
- sample_list_generator.sh
- Slurm specific options
- Common Problems and Errors