vadr-vscan-ncbi-resp

A Nextflow pipeline for running VADR v-scan to annotate nucleotide sequences using NCBI developed VADR model libraries for the following viral species:

Influenza virus
Sars-CoV-2 virus
Respiratory syncytial virus

Dependencies required before using "vadr-vscan-ncbi-resp"

Nextflow - installation instructions

Docker - installation instructions

How vadr-vscan-ncbi-resp pipeline works

The tool compares a FASTA formatted nucleotide sequence to curated reference models to automatically annotate genomic features (.vadr.tbl) and report potential sequence anomalies (error_alert.tsv). If a Submission Template (.sbt) file and a Source Modifiers Table (.src) file are included, ASN.1 Format (.sqn) files will be generated for NCBI GenBank submission of your sequences by email to gb-sub@ncbi.nlm.nih.gov

Utility tools for creating a samplesheet of the fasta files (if needed)

This pipeline is a containerized application that can automatically scale to utilize the computing resources available (desktop, cloud, or cluster). To run efficiently, it needs a samplesheet in CSV format. This file helps the pipeline process multiple sequences in parallel, depending on the available resources. The samplesheet must indicate the FASTA header and the file path for each query sequence.

Note: the current version of the pipeline requires each query sequence to be stored in a separate FASTA file. Multi-FASTA files are not supported.

You are welcome to split multifasta and create the samplesheet manually, but we also provide scripts to generate it automatically for your convenience.

Download utils.zip and unzip.

To split a multi-FASTA file into single FASTA files:

Copy and paste the following command into your terminal, replacing <multi_fasta_file.fasta> with the name of your multi-FASTA file, and <output_directory> with the path to the folder where you want to save your individual FASTA files:

python3 utils/split_multi_fasta.py <multi_fasta_file.fasta> <output_directory>

To create a samplesheet csv file:

Copy and paste the following command into your terminal, replacing <input_fasta_directory> with the path to the folder containing your FASTA files, and <samplesheet_output.csv> with your desired samplesheet filename:

python3 utils/generate_sample_fasta_csv.py <input_fasta_directory> <samplesheet_output.csv>

Input samplesheet.csv example format:

sample,fasta
SAMPLE1,/PATH/TO/SAMPLE1.fasta
SAMPLE2,/PATH/TO/SAMPLE2.fasta

Running vadr-vscan-ncbi-resp

The input sequences must be nucleotide sequences from one of the currently supported virus species listed above. In the default version vadr-vscan-ncbi-resp will generate a 5-column annotation table and an annotated sequence in .gb format (with "test" metadata). Copy and paste the following command into your terminal, replacing <samplesheet_output.csv> with actual name of your samplesheet:

nextflow run greninger-lab/vadr-vscan-ncbi-resp -r main -latest --input <samplesheet_output.csv> --outdir ./out -profile docker

However, if you also want to create the genbank submission .sqn files (and .gb files with complete metadata), you should indicate a Submission Template (.sbt) file and a Source Modifiers Table (.src) file. Copy and paste the following command into your terminal, replacing <samplesheet_output.csv>, <submission_template.sbt> and <source_modifiers.src> with your actual filenames:

nextflow run greninger-lab/vadr-vscan-ncbi-resp -r main -latest --input <samplesheet_output.csv> --sbt <submission_template.sbt> --src <source_modifiers.src> --outdir ./out -profile docker

Command line options

option	description
`--input /path/to/your/sample_fastas.csv`	(required) path to a csv sample,fasta input file
`--outdir /path/to/output`	(required) output directory
`--vadr_keep`	(optional) keeps all VADR output in the output/vadr directory (SAMPLE_out)
`--sbt <file>`	(optional) path to a GenBank Submission Template (.sbt) file
`--src <file>`	(optional) path to a Source Modifiers Table (.src) file
`-profile docker`	(required)
`--vadr_mem XXGB`	(optional) Override the memory requested for the VADR container (default is 36GB)
`--vadr_cpus N`	(optional) Override the number of CPUs (default is 6) available for VADR
`-c /path/to/your/custom.config`	(optional) used specify a custom configuration file (see Nextflow docs

You can test with the example input FASTA

Download example.zip

unzip example.zip
cd example
nextflow run greninger-lab/vadr-vscan-ncbi-resp -r main -latest --input example.csv --outdir ./out -profile docker

The default (no "optional" command line options) output directory:

out
├── pipeline_info
├── summary
|   ├── batch_classify_pass_fail.tsv  ← ⚠️ Check this file for VADR pass/fail reports on each sequence
|   └── batch_error_alert.tsv         ← ⚠️ Check here for error alerts for any VADR failed sequences
└── vadr
    ├── MZ054879.fsa
    ├── MZ054879.gbf
    ├── MZ054879_out.vadr.tbl

Using options "--vadr_keep", "--sbt" and "--src"

out
├── pipeline_info
├── summary
|   ├── batch_classify_pass_fail.tsv  ← ⚠️ Check this file for VADR pass/fail reports on each sequence
|   └── batch_error_alert.tsv         ← ⚠️ Check here for error alerts for any VADR failed sequences
└── vadr
    ├── MZ054879.fsa
    ├── MZ054879.gbf
    ├── MZ054879.sqn  ← Created when --sbt <file> and --src <file> are provided
    ├── MZ054879_out  ← Created when --vadr_keep is provided
    │   ├── MZ054879_out.muv.vadr.alc
    │   ├── MZ054879_out.muv.vadr.alt      ← VADR alert file (see below)
    │   ├── MZ054879_out.muv.vadr.alt.list ← VADR alert file (used for generating batch_error_alert.tsv)
    │   ├── <additional VADR output files>
    ├── MZ054879_out.vadr.tbl

Notes about VADR (v-annotate.pl) error alerts

VADR v-annotate.pl detects and reports alerts for more than 70 types of unexpected sequence characteristics. Documentation for v-annotate.pl can be found here, and extensive documentation for v-annotate.pl alerts is available here.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
bin		bin
conf		conf
docker_imgs/vadr-ncbi-resp		docker_imgs/vadr-ncbi-resp
lib		lib
modules/local		modules/local
subworkflows/local		subworkflows/local
workflows		workflows
.gitignore		.gitignore
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vadr-vscan-ncbi-resp

Dependencies required before using "vadr-vscan-ncbi-resp"

How vadr-vscan-ncbi-resp pipeline works

Utility tools for creating a samplesheet of the fasta files (if needed)

Download utils.zip and unzip.

To split a multi-FASTA file into single FASTA files:

To create a samplesheet csv file:

Input samplesheet.csv example format:

Running vadr-vscan-ncbi-resp

Command line options

You can test with the example input FASTA

The default (no "optional" command line options) output directory:

Using options "--vadr_keep", "--sbt" and "--src"

Notes about VADR (v-annotate.pl) error alerts

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vadr-vscan-ncbi-resp

Dependencies required before using "vadr-vscan-ncbi-resp"

How vadr-vscan-ncbi-resp pipeline works

Utility tools for creating a samplesheet of the fasta files (if needed)

Download utils.zip and unzip.

To split a multi-FASTA file into single FASTA files:

To create a samplesheet csv file:

Input samplesheet.csv example format:

Running vadr-vscan-ncbi-resp

Command line options

You can test with the example input FASTA

The default (no "optional" command line options) output directory:

Using options "--vadr_keep", "--sbt" and "--src"

Notes about VADR (v-annotate.pl) error alerts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages