Skip to content

Read_Mapping

Skylar Wyant edited this page Jun 20, 2016 · 21 revisions

Basic Usage

Read_Mapping starts a task array of QSub job submissions to the Portable Batch System job scheduler for read mapping using the Burrows-Wheeler Aligner (BWA). It can also index a Fasta file using BWA. To run Read_Mapping, all common and wrapper-specific variables must be defined within the configuration file. Once the variables have been defined, Read_Mapping can be submitted to a job scheduler with the following command:

./sequence_handling Read_Mapping Config

Where Config is the full file path to the configuration file.

Handler-Specific Variables

The following are a list of variables that need to be defined within Config. In addition to the handler-specific variables, all common variables must be defined.

Variable Function Default Value
RM_QSUB QSub settings for batch submission. Recommended settings are "mem=8gb,nodes=1:ppn=8,walltime=36:00:00".
TRIMMED_LIST A list of adapter-trimmed or quality-trimmed samples to read map. This will be ${OUT_DIR}/Adapter_Trimming/${PROJECT}_trimmed_adapters.txt (Adapter_Trimming) or ${OUT_DIR}/Quality_Trimming/${PROJECT}_trimmed_quality.txt (Quality_Trimming).
FORWARD_TRIMMED Suffix for forward reads. This will be _Forward_ScytheTrimmed.fastq.gz (Adapter_Trimming) or _R1_trimmed.fastq.gz (Quality_Trimming).
REVERSE_TRIMMED Suffix for reverse reads. This will be _Reverse_ScytheTrimmed.fastq.gz (Adapter_Trimming) or _R2_trimmed.fastq.gz (Quality_Trimming).
SINGLES_TRIMMED Suffix for single reads. This will be _Single_ScytheTrimmed.fastq.gz (Adapter_Trimming) or _single_trimmed.fastq.gz (Quality_Trimming).
THREADS How many threads to use. 1
SEED Minimum seed length. 19
WIDTH Band width. 100
DROPOFF Off-diagonal x-dropoff (Z-dropoff). 100
RE_SEED Re-seed value. 1.5
CUTOFF Cutoff value. 10000
MATCH Matching score. 1
MISMATCH Mismatch penalty. 4
GAP Gap penalty. 6
EXTENSION Gap extension penalty. 1
CLIP Clipping penalty. 6
UNPAIRED Unpaired read penalty. 9
INTERLEAVED Is the first input query interleaved? false
THRESHOLD Minimum threshold. 30
SECONDARY Output all alignments and mark as secondary. false
APPEND Append FastA/Q comments to SAM files. false
HARD Use hard clipping. false
SPLIT Mark split hits as secondary. true
VERBOSITY Verbosity level. Choose from 'disabled', 'errors', 'warnings', 'all', or 'debug'. 'all'

Note: if running single-end samples, leave FORWARD_TRIMMED and REVERSE_TRIMMED as the default. If running paired-end samples, leave SINGLES_TRIMMED as the default.

Output

If your reference genome is not indexed, Read_Mapping generates an index file for the reference genome in the same directory as the reference genome. Please make sure you have write permissions for said directory. After indexing Read_Mapping will exit, so you will need to run Read_Mapping again to map reads.

Read_Mapping also generates aligned SAM files for each sample, located under

${OUT_DIR}/Read_Mapping/${SAMPLE}.sam

where ${OUT_DIR} is specified in the configuration file. These SAM files have the '@SQ', '@RG', and '@PG' headers included in them. The '@HD' header is not generated from this process.

A list of files is not generated from Read_Mapping. However, SAM_Processing does not require a sample list, only a directory containing all the samples to be processed.

Dependencies

Read_Mapping depends on the Burrows-Wheeler Aligner and the Portable Batch System to run. If you want to use a different job scheduler or read mapper, you will need to modify this script extensively.

Clone this wiki locally