Skip to content

Latest commit

 

History

History
101 lines (69 loc) · 3.05 KB

File metadata and controls

101 lines (69 loc) · 3.05 KB

Whole Genome Sequencing Structural Variation Pipeline

Quick start

Install nextflow

curl -fsSL get.nextflow.io | bash
mv ./nextflow ~/bin

Run the pipeline

nextflow run NBISweden/wgs-structvar --project <uppmax_project_id> --bam <bamfile.bam> --steps manta,fermikit,vep

This will run both manta and fermikit, annotate the results with variant effect predictor and create summary files for everything in the results subdirectory.

It is recommended that you set the environment variable NXF_WORK to something like

export NXF_WORK=$SNIC_NOBACKUP/work

Preferably in your .bashrc.

General information

This is a pipeline for running the two structural variation callers fermikit and manta on UPPMAX.

You can choose to run either of the two structural variation callers or both (and generate summary files).

Masking

The pipeline will use the following mask files to remove known artifacts:

Detailed usage

Command line options

Usage:
    nextflow main.nf --bam <bamfile> [more options]

Options:
  Required
    --bam           Input bamfile
    --project       Uppmax project to log cluster time to
  Optional:
    (default values in parenthesis where applicable)
    --help          Show this message and exit
    --fastq         Input fastqfile (default is bam but with fq as fileending)
                    Used by fermikit, will be created from the bam file if
                    missing.
    --steps         Specify what steps to run, comma separated (manta,vep):
                Callers: manta, fermikit
                Annotation: vep, snpeff
    --outdir        Directory where resultfiles are stored (results)
    --prefix        Prefix for result filenames ()

Customization

The file nextflow.config can be used to make some further customizations to the workflow.

It's probably only the params scope of the config file that is of interest to customize.

The first part has the default values for the command line parameters, see the usage message for information on them.

The next section has the reference assembly to use, both as fasta and assembly name.

The modules section contains all modules used by the workflow and their versions, change modules here not in the main.nf file.

Finally the runtime section has the different runtimes for the different parts of the workflow. fermikit has it's own timespec since that is a very long running program, otherwise the workflow differentiates between callers and other supporting simple single-core jobs.

External links

Stories in Ready