Skip to content

Read_Mapping

Paul Hoffman edited this page Nov 29, 2016 · 21 revisions

Basic Usage

Read_Mapping starts a task array of QSub job submissions to the Portable Batch System job scheduler for lab-long to read map using the Burrows-Wheeler Aligner (BWA-MEM). It can also index a FastA file using BWA if the provided reference is not already indexed. To run Read_Mapping, all common and wrapper-specific variables must be defined within the configuration file. Once the variables have been defined, Read_Mapping can be submitted to a job scheduler with the following command (assuming that you are in the directory containing sequence_handling):

./sequence_handling Read_Mapping Config

Where Config is the full file path to the configuration file.

Handler-Specific Variables

The following are a list of variables that need to be defined within Config. In addition to the handler-specific variables, all common variables must be defined. The default parameters can be found on the BWA manual page and are designed for human genomes. Parameters may need to be adjusted on a per-species basis.

Variable Function Default Value
RM_QSUB QSub settings for batch submission. Recommended settings are "mem=8gb,nodes=1:ppn=8,walltime=48:00:00".
TRIMMED_LIST A list of adapter-trimmed or quality-trimmed samples to read map. This will be ${OUT_DIR}/Adapter_Trimming/${PROJECT}_trimmed_adapters.txt (Adapter_Trimming) or ${OUT_DIR}/Quality_Trimming/${PROJECT}_trimmed_quality.txt (Quality_Trimming).
FORWARD_TRIMMED Shared suffix for forward reads. This will be _Forward_ScytheTrimmed.fastq.gz (Adapter_Trimming) or _R1_trimmed.fastq.gz (Quality_Trimming).
REVERSE_TRIMMED Shared suffix for reverse reads. This will be _Reverse_ScytheTrimmed.fastq.gz (Adapter_Trimming) or _R2_trimmed.fastq.gz (Quality_Trimming).
SINGLES_TRIMMED Shared suffix for single reads. This will be _Single_ScytheTrimmed.fastq.gz (Adapter_Trimming) or _single_trimmed.fastq.gz (Quality_Trimming).
THREADS How many threads to use. 1
SEED Minimum seed length. 19
WIDTH Band width. 100
DROPOFF Off-diagonal x-dropoff (Z-dropoff). 100
RE_SEED Re-seed value. 1.5
CUTOFF Cutoff value. 10000
MATCH Matching score. 1
MISMATCH Mismatch penalty. 4
GAP Gap penalty. 6
EXTENSION Gap extension penalty. 1
CLIP Clipping penalty. 6
UNPAIRED Unpaired read penalty. 9
RESCUE Attempt to rescue missing hits in paired-end mode? Note: this means that reads may not be matched false
INTERLEAVED Is the first input query interleaved? false
RM_THRESHOLD Minimum threshold. 30
SECONDARY Output all alignments and mark as secondary. false
APPEND Append FastA/Q comments to SAM files. false
HARD Use hard clipping. false
SPLIT Mark split hits as secondary. true
VERBOSITY Verbosity level. Choose from 'disabled', 'errors', 'warnings', 'all', or 'debug'. 'all'

Note: If running single-end samples, leave FORWARD_TRIMMED and REVERSE_TRIMMED filled with values that do NOT match your samples. If running paired-end samples, leave SINGLES_TRIMMED filled with values that do NOT match your samples.

Output

If your reference genome is not indexed, Read_Mapping generates an index file for the reference genome in the same directory as the reference genome. Please make sure you have write permissions for said directory. After indexing Read_Mapping will exit, so you will need to run Read_Mapping again to map reads.

Read_Mapping generates aligned SAM files for each sample, located under

${OUT_DIR}/Read_Mapping/${SAMPLE}.sam

where ${OUT_DIR} is specified in the configuration file. These SAM files have the '@SQ', '@RG', and '@PG' headers included in them. The '@HD' header is not generated from this process.

A list of files is not generated from Read_Mapping. However, you can generate one using sample_list_generator.sh.

Dependencies

Read_Mapping depends on the Burrows-Wheeler Aligner and the Portable Batch System to run. If you want to use a different job scheduler or read mapper, you will need to modify this script extensively. Future implementations of Read_Mapping using Bowtie 2 are in progress.

Clone this wiki locally