Skip to content

greninger-lab/vadr-vscan-ncbi-resp

Repository files navigation

vadr-vscan-ncbi-resp

A Nextflow pipeline for running VADR v-scan to annotate nucleotide sequences using NCBI developed VADR model libraries for the following viral species:

  • Influenza virus
  • Sars-CoV-2 virus
  • Respiratory syncytial virus

Dependencies required before using "vadr-vscan-ncbi-resp"

Nextflow - installation instructions

Docker - installation instructions

How vadr-vscan-ncbi-resp pipeline works

The tool compares a FASTA formatted nucleotide sequence to curated reference models to automatically annotate genomic features (.vadr.tbl) and report potential sequence anomalies (error_alert.tsv). If a Submission Template (.sbt) file and a Source Modifiers Table (.src) file are included, ASN.1 Format (.sqn) files will be generated for NCBI GenBank submission of your sequences by email to gb-sub@ncbi.nlm.nih.gov

Utility tools for creating a samplesheet of the fasta files (if needed)

This pipeline is a containerized application that can automatically scale to utilize the computing resources available (desktop, cloud, or cluster). To run efficiently, it needs a samplesheet in CSV format. This file helps the pipeline process multiple sequences in parallel, depending on the available resources. The samplesheet must indicate the FASTA header and the file path for each query sequence.

Note: the current version of the pipeline requires each query sequence to be stored in a separate FASTA file. Multi-FASTA files are not supported.

You are welcome to split multifasta and create the samplesheet manually, but we also provide scripts to generate it automatically for your convenience.

Download utils.zip and unzip.

To split a multi-FASTA file into single FASTA files:

Copy and paste the following command into your terminal, replacing <multi_fasta_file.fasta> with the name of your multi-FASTA file, and <output_directory> with the path to the folder where you want to save your individual FASTA files:

python3 utils/split_multi_fasta.py <multi_fasta_file.fasta> <output_directory>

To create a samplesheet csv file:

Copy and paste the following command into your terminal, replacing <input_fasta_directory> with the path to the folder containing your FASTA files, and <samplesheet_output.csv> with your desired samplesheet filename:

python3 utils/generate_sample_fasta_csv.py <input_fasta_directory> <samplesheet_output.csv>

Input samplesheet.csv example format:

sample,fasta
SAMPLE1,/PATH/TO/SAMPLE1.fasta
SAMPLE2,/PATH/TO/SAMPLE2.fasta

Running vadr-vscan-ncbi-resp

The input sequences must be nucleotide sequences from one of the currently supported virus species listed above. In the default version vadr-vscan-ncbi-resp will generate a 5-column annotation table and an annotated sequence in .gb format (with "test" metadata). Copy and paste the following command into your terminal, replacing <samplesheet_output.csv> with actual name of your samplesheet:

nextflow run greninger-lab/vadr-vscan-ncbi-resp -r main -latest --input <samplesheet_output.csv> --outdir ./out -profile docker

However, if you also want to create the genbank submission .sqn files (and .gb files with complete metadata), you should indicate a Submission Template (.sbt) file and a Source Modifiers Table (.src) file. Copy and paste the following command into your terminal, replacing <samplesheet_output.csv>, <submission_template.sbt> and <source_modifiers.src> with your actual filenames:

nextflow run greninger-lab/vadr-vscan-ncbi-resp -r main -latest --input <samplesheet_output.csv> --sbt <submission_template.sbt> --src <source_modifiers.src> --outdir ./out -profile docker

Command line options

option description
--input /path/to/your/sample_fastas.csv (required) path to a csv sample,fasta input file
--outdir /path/to/output (required) output directory
--vadr_keep (optional) keeps all VADR output in the output/vadr directory (SAMPLE_out)
--sbt <file> (optional) path to a GenBank Submission Template (.sbt) file
--src <file> (optional) path to a Source Modifiers Table (.src) file
-profile docker (required)
--vadr_mem XXGB (optional) Override the memory requested for the VADR container (default is 36GB)
--vadr_cpus N (optional) Override the number of CPUs (default is 6) available for VADR
-c /path/to/your/custom.config (optional) used specify a custom configuration file (see Nextflow docs

You can test with the example input FASTA

Download example.zip

unzip example.zip
cd example
nextflow run greninger-lab/vadr-vscan-ncbi-resp -r main -latest --input example.csv --outdir ./out -profile docker

The default (no "optional" command line options) output directory:

out
├── pipeline_info
├── summary
|   ├── batch_classify_pass_fail.tsv  ← ⚠️ Check this file for VADR pass/fail reports on each sequence
|   └── batch_error_alert.tsv         ← ⚠️ Check here for error alerts for any VADR failed sequences
└── vadr
    ├── MZ054879.fsa
    ├── MZ054879.gbf
    ├── MZ054879_out.vadr.tbl

Using options "--vadr_keep", "--sbt" and "--src"

out
├── pipeline_info
├── summary
|   ├── batch_classify_pass_fail.tsv  ← ⚠️ Check this file for VADR pass/fail reports on each sequence
|   └── batch_error_alert.tsv         ← ⚠️ Check here for error alerts for any VADR failed sequences
└── vadr
    ├── MZ054879.fsa
    ├── MZ054879.gbf
    ├── MZ054879.sqn  ← Created when --sbt <file> and --src <file> are provided
    ├── MZ054879_out  ← Created when --vadr_keep is provided
    │   ├── MZ054879_out.muv.vadr.alc
    │   ├── MZ054879_out.muv.vadr.alt      ← VADR alert file (see below)
    │   ├── MZ054879_out.muv.vadr.alt.list ← VADR alert file (used for generating batch_error_alert.tsv)
    │   ├── <additional VADR output files>
    ├── MZ054879_out.vadr.tbl

Notes about VADR (v-annotate.pl) error alerts

VADR v-annotate.pl detects and reports alerts for more than 70 types of unexpected sequence characteristics. Documentation for v-annotate.pl can be found here, and extensive documentation for v-annotate.pl alerts is available here.

About

Use VADR v-scan to annotate Sars-CoV-2, flu, and RSV sequences.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors