-
Notifications
You must be signed in to change notification settings - Fork 9
Read_Mapping
Read_Mapping starts a task array of QSub job submissions to the Portable Batch System job scheduler for read mapping using the Burrows Wheeler Aligner (BWA). It can also index a Fasta file using BWA. To run Read_Mapping, all common and wrapper-specific variables must be defined within the configuration file. Once the variables have been defined, Read_Mapping can be submitted to a job scheduler with the following command:
sequence_handling Read_Mapping ConfigWhere Config is the full file path to the configuration file.
The following are a list of variables that need to be defined within Config. In addition to the handler-specific variables, all common variables must be defined.
| Variable | Function | Default Value |
|---|---|---|
| RM_QSUB | QSub settings for batch submission. Recommended settings are "mem=8gb,nodes=1:ppn=8,walltime=16:00:00". | |
| TRIMMED_LIST | A list of adapter-trimmed or quality-trimmed samples to read map. This will be ${OUT_DIR}/Adapter_Trimming/${PROJECT}_trimmed_adapters.txt (Adapter_Trimming) or ${OUT_DIR}/Quality_Trimming/${PROJECT}_trimmed_quality.txt (Quality_Trimming). |
|
| FORWARD_TRIMMED | Suffix for forward reads. This will be _Forward_ScytheTrimmed.fastq.gz (Adapter_Trimming) or _R1_trimmed.fastq.gz (Quality_Trimming). |
|
| REVERSE_TRIMMED | Suffix for reverse reads. This will be _Reverse_ScytheTrimmed.fastq.gz (Adapter_Trimming) or _R2_trimmed.fastq.gz (Quality_Trimming). |
|
| SINGLES_TRIMMED | Suffix for single reads. This will be _Single_ScytheTrimmed.fastq.gz (Adapter_Trimming) or _single_trimmed.fastq.gz (Quality_Trimming). |
|
| THREADS | How many threads to use. | 1 |
| SEED | Minimum seed length. | 19 |
| WIDTH | Band width. | 100 |
| DROPOFF | Off-diagonal x-dropoff (Z-dropoff). | 100 |
| RE_SEED | Re-seed value. | 1.5 |
| CUTOFF | Cutoff value. | 10000 |
| MATCH | Matching score. | 1 |
| MISMATCH | Mismatch penalty. | 4 |
| GAP | Gap penalty. | 6 |
| EXTENSION | Gap extension penalty. | 1 |
| CLIP | Clipping penalty. | 6 |
| UNPAIRED | Unpaired read penalty. | 9 |
| INTERLEAVED | Is the first input query interleaved? | false |
| THRESHOLD | Minimum threshold. | 30 |
| SECONDARY | Output all alignments and mark as secondary. | false |
| APPEND | Append FastA/Q comments to SAM files. | false |
| HARD | Use hard clipping. | false |
| SPLIT | Mark split hits as secondary. | true |
| VERBOSITY | Verbosity level. Choose from 'disabled', 'errors', 'warnings', 'all', or 'debug'. | 'all' |
Note: if running single-end samples, leave FORWARD_TRIMMED and REVERSE_TRIMMED blank. If running paired-end samples, leave SINGLES_TRIMMED blank.
If your reference genome is not indexed, Read_Mapping generates an index file for the reference genome in the same directory as the reference genome. Please make sure you have write permissions for said directory. After indexing Read_Mapping will exit, so you will need to run Read_Mapping again to map reads.
Read_Mapping also generates aligned SAM files for each sample, located under
${OUT_DIR}/Read_Mapping/${SAMPLE}.samwhere ${OUT_DIR} is specified in the configuration file. These SAM files have the '@SQ', '@RG', and '@PG' headers included in them. The '@HD' header is not generated from this process.
A list of files is not generated from Read_Mapping. However, SAM_Processing does not require a sample list, only a directory containing all the samples to be processed.
Read_Mapping depends on the Burrows Wheeler Aligner and the Portable Batch System to run. If you want to use a different job scheduler or read mapper, you will need to modify this script extensively.
Next: SAM_Processing
- Getting Started
- Recommended Workflow
- Configuration
- Dependencies
- sample_list_generator.sh
- Slurm specific options
- Common Problems and Errors