A modular Nextflow pipeline for bacterial genome QC and assembly, developed as part of Georgia Tech's BIOL7210 Computational Genomics course.
This repository contains a Nextflow pipeline for performing quality control, and assembling genomic sequences.
Course: BIOL7210 - Computational Genomics
Author: S Birendra Kumar
Institution: Georgia Tech
GitHub Repo: https://github.com/Birendra-Kumar-S/bacterial_genomics_nextflow
Nextflow Version: 24.10.4.5934
Package manager: conda
This workflow performs quality control, calculates trimmed read statistics and assembles genomic sequences.
The pipeline supports both sequential and parallel processing to optimize execution.
1️⃣ Sequential Execution:
- FASTP →
SKESA(Genome Assembly)
2️⃣ Parallel Execution:
- FASTP →
SEQKIT(Read Statistics)
-
Read Processing
- Quality control and adapter trimming with FASTP (v0.24.0)
-
Assembly
- De novo genome assembly with SKESA (v2.5.1)
-
READ statistics
- Calculation of quality filtered or trimmed reads' statistics using SeqKit (v2.10.0)
System Version: macOS 15.3.2 (24D81)
OS : Sequoia 15.3.2
Model Name: MacBook Pro
Kernel : Darwin 24.3.0
Chip : Apple M4
Number of Cores: 10 (4 performance and 6 efficiency)
RAM : 16 GB
Nextflow : v24.10.5
Java : OpenJDK 22 (via Conda)Diagram illustrating the pipeline's workflow, showing the sequence of processes and their dependencies. Obtained using Nextflow's built-in DAG visualization tool.
The included test data in the test_data/ directory contains paired-end reads from Listeria monocytogenes (SRA accession:SRR1556296)
- Nextflow - Workflow engine (DSL2)
- FASTP - Read quality control
- SKESA - De novo assembly
- SeqKit - Read Statistics
Perform the below steps sequentially
# Clone the repository
git clone https://github.com/Birendra-Kumar-S/bacterial_genomics_nextflow_pipeline
cd bacterial_genomics_nextflow_pipelineWould suggest to create a new conda env with nextflow installed as specified below:
CONDA_SUBDIR=osx-64 conda create -n nf_test -c bioconda nextflow -y
conda activate nf_testexport CONDA_SUBDIR=osx-64
nextflow run pipeline.nf -with-conda