DIANA: An integrated pipeline for analysis of long-read whole-genome sequencing data for molecular neuropathology
Diagnostic Integrated Analytics for Neoplastic Alterations (DIANA) is a comprehensive bioinformatics pipeline for analyzing Neoplastic alteration. It integrates multiple analyses including CNV detection, methylation profiling, structural variant calling, and MGMT promoter status determination.
DIANA pipeline follows a modular architecture with four main Nextflow modules that can be run independently or sequentially:
Pipeline workflow showing the flow from BAM files through Mergebam, Epi2me, and Annotation modules to final PDF reports.
- Docker (Desktop/Local) or Singularity/Apptainer (HPC)
- Java 11–21 (auto-installed by setup script if missing; required by Nextflow)
- Nextflow (auto-installed by setup script)
- Internet connection for downloading reference files from Zenodo
The pipeline now features a unified setup script that automatically downloads all reference files from Zenodo:
For Docker (Desktop/Local):
git clone https://github.com/VilhelmMagnusLab/Diana.git
cd Diana
./setup_pipeline.sh docker
./run_pipeline_docker.sh --run_mode_order --sample_id YOUR_SAMPLE_IDFor Singularity/Apptainer (HPC):
git clone https://github.com/VilhelmMagnusLab/Diana.git
cd Diana
./setup_pipeline.sh singularity
./run_pipeline_singularity.sh --run_mode_order --sample_id YOUR_SAMPLE_IDWhat the setup script does:
- Checks for compatible Java (11–21) and installs it if missing
- Installs Nextflow and adds it to PATH via
.diana_env - Downloads all reference files from Zenodo (DOI: 10.5281/zenodo.19232427)
- Extracts and organizes files into the correct directory structure
- Downloads and sets up Docker containers or Singularity images
Note: First-time setup downloads ~14 GB of reference data and may take 10-30 minutes depending on your internet connection.
A minimal test dataset (diana_dummy) is automatically downloaded and extracted into data/diana_dummy/ by setup_pipeline.sh. It contains a single sample (diana-001) with a small BAM file and the required final_summary trigger file, mirroring the expected input structure.
After setup completes, run the pipeline on the demo data:
# Docker
bash smart_sample_monitor_v2.sh -d data/diana_dummy
# Singularity/Apptainer
bash smart_sample_monitor_v2.sh --singularity -d data/diana_dummyThe monitor will detect the final_summary file in diana_dummy/diana-001/, trigger the full pipeline, and write results to ~/routine_diana/routine_results/diana-001/.
The sample ID files created by
setup_pipeline.shalready containdiana-001/PBE00000— no manual configuration needed.
The pipeline consists of three main modules that can be run independently or sequentially:
- Merges multiple BAM files per sample
- Extracts protein-coding regions of interest using
roi.protein_coding.bed
Three independent analysis types:
| Analysis | Tool | Purpose | Output |
|---|---|---|---|
| Modified Base Calling | Modkit | DNA modifications (5mC, 5hmC) | *_wf_mods.bedmethyl.gz |
| Structural Variants | Sniffles2 | Structural variant detection | *.sniffles.vcf.gz |
| Copy Number Variation | QDNAseq | CNV detection | *_segs.bed, *_bins.bed, *_segs.vcf |
PacBio HiFi BAM files from the sequencer need to be unaligned. Before running the pipeline (specifically before modkit modified base calling), the BAM must be aligned to the reference genome. Use the following command:
samtools fastq -T MM,ML /path/to/input.hifi_reads.bam \
| minimap2 -y -ax map-hifi -t 4 /path/to/GRCh38_reference.fa - \
| samtools sort -@ 4 -o /path/to/output.hifi_reads.aligned.bam
# Then index the aligned BAM
samtools index /path/to/output.hifi_reads.aligned.bamImportant: The
-T MM,MLflag insamtools fastqis required to preserve the base modification tags (MM and ML) that encode 5mC methylation information. Without these tags, modkit will not be able to extract methylation calls.
- MGMT methylation analysis using EPIC array sites
- NanoDx neural network classification with dual classifier support:
- Capper et al. classifier (default) - Optimized for brain tumors
- Pan-cancer classifier v5i - Broader tumor type coverage (use
--pancanflag)
- Structural variant annotation with Svanna
- SNV annotation with Clair3 (germline) and ClairS-TO (somatic), filtered by configurable Depth and GQ thresholds
- CNV analysis with ACE tumor content determination
- Comprehensive reporting (HTML, IGV snapshots, Circos plots, Markdown)
The pipeline supports two NanoDx methylation classifiers:
| Classifier | Flag | Recommended For | Description |
|---|---|---|---|
| Capper et al. | (default) | Brain tumors | Default classifier optimized for CNS tumor classification |
| Pan-cancer v5i | --pancan |
Broader tumor types | Extended classifier covering wider range of tumor types |
Example usage:
# Default - Capper et al. classifier
./run_pipeline_singularity.sh --run_mode_order --sample_id SAMPLE_001
# Pan-cancer classifier
./run_pipeline_singularity.sh --run_mode_order --sample_id SAMPLE_001 --pancanThe --pancan flag works with all run modes and can be combined with any pipeline configuration.
The pipeline can be executed in different modes:
| Mode | Flag | Description | Use Case |
|---|---|---|---|
| Complete Pipeline | --run_mode_order |
Runs all three modules sequentially (Mergebam → Epi2me → Annotation) | Starting from raw BAM files |
| Epi2me + Annotation | --run_mode_epiannotation |
Runs Epi2me and Annotation sequentially (assumes merged BAM files exist) | When BAM files are already merged |
| Mergebam Only | --run_mode_mergebam |
Merges BAM files and extracts regions of interest | BAM preparation only |
| Epi2me Only | --run_mode_epi2me [all|modkit|cnv|sv|snv] |
Runs specific Epi2me analyses | Methylation, CNV, SV, or SNV calling |
| Annotation Only | --run_mode_annotation [all|mgmt|cnv|svannasv|terp|snv|rmd] |
Runs specific downstream analyses | Report generation or specific analyses |
| Feature | Docker | Singularity/Apptainer |
|---|---|---|
| Best for | Desktop/Local | HPC/Shared systems |
| Setup Script | setup_docker.sh |
setup_singularity.sh |
| Run Script | run_pipeline_docker.sh |
run_pipeline_singularity.sh |
All containers are automatically downloaded from vilhelmmagnuslab Docker Hub.
# Docker - Full pipeline starting from raw BAM files
./run_pipeline_docker.sh --run_mode_order --sample_id T001
# Singularity/Apptainer - Full pipeline starting from raw BAM files
./run_pipeline_singularity.sh --run_mode_order --sample_id T001# Docker - Skip mergebam, run Epi2me and Annotation
./run_pipeline_docker.sh --run_mode_epiannotation --sample_id T001
# Singularity/Apptainer - Skip mergebam, run Epi2me and Annotation
./run_pipeline_singularity.sh --run_mode_epiannotation --sample_id T001Docker Commands:
# Mergebam only
./run_pipeline_docker.sh --run_mode_mergebam
# Epi2me analyses
./run_pipeline_docker.sh --run_mode_epi2me all # All Epi2me analyses
./run_pipeline_docker.sh --run_mode_epi2me stat # QC statistics (cramino) only
./run_pipeline_docker.sh --run_mode_epi2me modkit # Modified base calling only
./run_pipeline_docker.sh --run_mode_epi2me cnv # CNV analysis only
./run_pipeline_docker.sh --run_mode_epi2me sv # Structural variants only
./run_pipeline_docker.sh --run_mode_epi2me snv # SNV calling (Clair3 + ClairS-TO) only
# Annotation modules
./run_pipeline_docker.sh --run_mode_annotation all # All analyses
./run_pipeline_docker.sh --run_mode_annotation mgmt # MGMT analysis only
./run_pipeline_docker.sh --run_mode_annotation cnv # CNV analysis only
./run_pipeline_docker.sh --run_mode_annotation svannasv # Svanna SV annotation only
./run_pipeline_docker.sh --run_mode_annotation terp # TERTp promoter analysis only
./run_pipeline_docker.sh --run_mode_annotation snv # SNV annotation (Clair3 + ClairS-TO) only
./run_pipeline_docker.sh --run_mode_annotation rmd # Markdown report onlySingularity/Apptainer Commands:
# Mergebam only
./run_pipeline_singularity.sh --run_mode_mergebam
# Epi2me analyses
./run_pipeline_singularity.sh --run_mode_epi2me all # All Epi2me analyses
./run_pipeline_singularity.sh --run_mode_epi2me stat # QC statistics (cramino) only
./run_pipeline_singularity.sh --run_mode_epi2me modkit # Modified base calling only
./run_pipeline_singularity.sh --run_mode_epi2me cnv # CNV analysis only
./run_pipeline_singularity.sh --run_mode_epi2me sv # Structural variants only
./run_pipeline_singularity.sh --run_mode_epi2me snv # SNV calling (Clair3 + ClairS-TO) only
# Annotation modules
./run_pipeline_singularity.sh --run_mode_annotation all # All analyses
./run_pipeline_singularity.sh --run_mode_annotation mgmt # MGMT analysis only
./run_pipeline_singularity.sh --run_mode_annotation cnv # CNV analysis only
./run_pipeline_singularity.sh --run_mode_annotation svannasv # Svanna SV annotation only
./run_pipeline_singularity.sh --run_mode_annotation terp # TERT promoter analysis only
./run_pipeline_singularity.sh --run_mode_annotation snv # SNV annotation (Clair3 + ClairS-TO) only
./run_pipeline_singularity.sh --run_mode_annotation rmd # Markdown report only# For annotation pipeline (with tumor content)
sample_id1 0.75 # 75% tumor content
sample_id2 # Auto-calculate with ACE
# For mergebam pipeline (with flowcell)
sample_id1 flowcell_id1
sample_id2 flowcell_id2
The pipeline uses a standardized directory structure with separate input and output paths:
Pipeline directory:
/data/routine_diana/Diana/
├── conf/ # Configuration files
│ ├── mergebam.config # Mergebam module config
│ ├── epi2me.config # Epi2me module config
│ └── annotation.config # Annotation module config
├── modules/ # Nextflow modules
├── containers/ # Singularity container images
├── bin/ # Helper scripts
├── docs/ # Documentation
└── smart_sample_monitor_v2.sh # Automated monitoring script
Pipeline data directory (configured via params.path):
/data/
├── reference/ # Reference files (GRCh38, BED files, etc.)
└── humandb/ # Annotation databases
Input data directory (configured via params.input_dir in mergebam.config):
/data/WGS_[DATE]/ # Oxford Nanopore sequencing output
├── SAMPLE_01/ # Sample directory
│ └── [subdirectory]/ # Any subdirectory structure
│ ├── *.bam # BAM files from ONT sequencing
│ ├── *.bam.bai # BAM index files
│ └── final_summary_*_*_*.txt # Completion marker file
├── SAMPLE_02/
│ └── [subdirectory]/
│ ├── *.bam
│ ├── *.bam.bai
│ └── final_summary_*_*_*.txt
└── ...
Output directory (configured via params.path_output):
routine_diana/
├── sample_ids_bam.txt # Sample IDs for BAM merging
│
├── routine_bams/ # Processed BAM files (Mergebam module)
│ ├── merge_bams/ # Merged BAM files per sample
│ └── roi_bams/ # Region of interest extracted BAMs
│
├── routine_epi2me/ # Epi2me module results
│ └── [sample_id]/
│ ├── *.wf_mods.bedmethyl.gz # Methylation calls (modkit)
│ ├── *.sniffles.vcf.gz # Structural variants (Sniffles2)
│ ├── *_segs.bed # CNV segments (QDNAseq)
│ ├── *_bins.bed # CNV bins
│ ├── *_copyNumbersCalled.rds # CNV RDS file for ACE
│ ├── clair3/ # Germline SNV calling (Clair3)
│ │ └── *.vcf.gz
│ └── clairs-to/ # Somatic SNV calling (ClairS-TO)
│ └── *.vcf.gz
│
├── routine_annotation/ # Analysis module results (detailed outputs)
│ └── [sample_id]/
│ ├── classifier/ # Tumor classification
│ │ ├── nanodx/ # NanoDx neural network results
│ │ └── sturgeon/ # Sturgeon methylation classifier
│ ├── cnv/ # CNV analysis
│ │ ├── ace/ # ACE tumor content estimation
│ │ ├── annotatedcnv/ # Annotated CNV calls
│ │ └── *.pdf # CNV plots (chr7, chr9, full genome)
│ ├── coverage/ # IGV coverage snapshots
│ │ ├── *_egfr_coverage.pdf
│ │ ├── *_idh1_coverage.pdf
│ │ ├── *_idh2_coverage.pdf
│ │ └── *_tertp_coverage.pdf
│ ├── cramino/ # BAM statistics
│ │ └── *_cramino_statistics.txt
│ ├── merge_annot_clair3andclairsto/ # Variant annotation
│ │ └── *_merge_annotation_filter_snvs_allcall.csv
│ ├── methylation/ # MGMT methylation analysis
│ │ └── *_MGMT_results.csv
│ └── structure_variant/ # SV annotation
│ ├── *_circos.pdf # Circos plot
│ ├── *_fusion_events.tsv # Fusion events
│ └── *_svanna_annotation.html # Svanna SV annotation
│
└── routine_results/ # Final published reports (per sample)
└── [sample_id]/
├── [sample_id]_bedmethyl_sturgeon_general.pdf # Sturgeon classification
├── [sample_id]_markdown_pipeline_report.pdf # Main comprehensive report
├── [sample_id]_mnpflex_input.bed # MNP-Flex input format
├── [sample_id]_occ_svanna_annotation.html # SV annotation HTML
└── [sample_id]_tsne_plot.html # t-SNE visualization
The setup_pipeline.sh script automatically downloads and sets up all required reference files from Zenodo.
Simply run:
./setup_pipeline.sh docker # For Docker users
# or
./setup_pipeline.sh singularity # For Singularity usersThe script will:
- Download reference data from Zenodo (DOI: 10.5281/zenodo.19232427)
- Extract and organize all files into the correct directory structure
- Set up NanoDx classifier models
- Configure all required paths
If you prefer manual setup or need to customize the reference files:
Core reference files (automatically placed in data/reference/):
reference_core.tar.gz- Contains GRCh38 reference genome, BED files, and annotations including:GRCh38.faandGRCh38.fa.fai- Human reference genomeEPIC_sites_NEW.bed- Methylation sitesMGMT_CpG_Island.hg38.bed- MGMT CpG islandsroi.protein_coding.bed- Region of interest BED file (protein-coding genes for SNV screening and BAM extraction)TERTp_variants.bed- TERT promoter variantshuman_GRCh38_trf.bed- Tandem repeat regionsCNV_genes_tuned.csv- CNV gene annotationsocc_fusions_genes.txt- User-defined region of interest gene list for SV/fusion filtering and SNV annotation (one gene per line; can be replaced with any custom gene list)nanoDx/- NanoDx neural network classifier (with models from Zenodo)
Annotation databases (automatically placed in data/humandb/):
humandb.tar.gz- Contains ANNOVAR annotation databases:hg38_refGene.txt- RefGene annotationhg38_refGeneMrna.fa- RefGene mRNA sequenceshg38_clinvar_20240611.txt- ClinVar annotationshg38_cosmic100coding2024.txt- Cosmic annotations
Additional reference files (automatically extracted to data/reference/):
general.zip- Sturgeon classifier model (kept as zip, not extracted)Assembly.zip- Assembly folder for vcfcircos visualization (automatically extracted)r1041_e82_400bps_sup_v420.zip- ONT basecalling model for ClairS-TO (automatically extracted)svanna-data.zip- Svanna structural variant annotation database (optional, automatically extracted)
Note on roi.protein_coding.bed: This ROI BED file uses OCC (Onco-Comprehensive-Coverage) genes but can be substituted with any custom ROI BED file. It's used for:
- Extracting regions of interest during BAM merging (mergebam module)
- SNV screening regions for variant calling (ClairS-TO analysis)
- Ensure proper BED format with exactly 10 tab-separated fields per line
Note on occ_fusions_genes.txt: Plain-text gene list (one gene symbol per line) used for SV/fusion event filtering and SNV annotation. This file can be replaced with any custom gene list of interest — for example, a laboratory-specific panel of oncology-relevant genes. The default list contains 204 genes covering common fusion partners and oncogenes.
Manual download: If needed, all reference files are available at Zenodo (DOI: 10.5281/zenodo.19232427)
After downloading the reference files, your directory structure should look like this:
data/
├── reference/ # Reference files
│ ├── GRCh38.fa
│ ├── GRCh38.fa.fai
│ ├── gencode.v48.annotation.gff3
│ ├── Assembly/ # Assembly folder for vcfcircos (from Zenodo)
│ ├── EPIC_sites_NEW.bed
│ ├── MGMT_CpG_Island.hg38.bed
│ ├── roi.protein_coding.bed
│ ├── TERTp_variants.bed
│ ├── human_GRCh38_trf.bed
│ ├── CNV_genes_tuned.csv
│ ├── occ_fusions_genes.txt
│ └── etc
│
└── humandb/ # Annotation databases
├── hg38_refGene.txt
├── hg38_refGeneMrna.fa
├── hg38_clinvar_20240611.txt
└── hg38_cosmic100coding2024.txt
The pipeline intelligently handles tumor content:
- Provided value: Use directly if specified in sample ID file
- Auto-calculation: ACE analyzes copy number profiles to estimate tumor cellularity
- Multiple estimates: ACE provides several estimates and selects the best fit
- Results: Saved in
${sample_id}_ace_results/threshold_value.txt
PDF reports are automatically generated when running the pipeline with the following modes:
--run_mode_annotation rmd- Generate reports only--run_mode_order- Run complete pipeline sequentially and generate reports--run_mode_epiannotation- Run Epi2me and annotation modules and generate reports
The reports are automatically created in the routine_results/{sample_id}/ directory with the name {sample_id}_markdown_pipeline_report.pdf.
The generate_report.sh script is provided for additional report generation in cases where:
- You want to regenerate reports after re-running specific processes
- You need to create reports for samples that were processed separately
- You need to generate reports after the pipeline has already completed
The pipeline uses three main path parameters that must be configured:
1. Pipeline Data Path (params.path) - Reference files and databases
// conf/annotation.config, conf/epi2me.config, conf/mergebam.config
params {
path = "/data/routine_diana/Diana/data"
// Contains: reference/, humandb/ directories
}2. Input Data Path (params.input_dir) - ONT sequencing output
// conf/mergebam.config
params {
input_dir = "/data/WGS_27102025"
// Contains: Sample directories with BAM files
// Can be overridden via CLI: --input_dir or smart_sample_monitor -d
}3. Output Path (params.path_output) - Pipeline results
// conf/mergebam.config, conf/epi2me.config, conf/annotation.config
params {
path_output = "/data/routine_diana"
// Contains: sample_ids_bam.txt, routine_bams/, routine_epi2me/, routine_results/
}Key Points:
params.path: Reference data (rarely changes)params.input_dir: ONT sequencing input (changes per run)params.path_output: Where all results are stored (consistent location)- The
input_dircan be overridden using--input_dirflag orsmart_sample_monitor_v2.sh -d
The pipeline includes configurable quality thresholds for SNV filtering in the final reports:
// conf/annotation.config
params {
snv_depth_threshold = 10 // Minimum sequencing depth (default: 10)
snv_gq_threshold = 10 // Minimum Genotype Quality (default: 10)
}How Filtering Works:
- Depth threshold: Filters out variants with sequencing depth below the threshold
- GQ threshold: For variants with multiple GQ values from different callers (e.g., "20,26,41"), keeps the variant if ANY value meets the threshold
- Both filters must pass for a variant to appear in the final report
Examples:
# Stricter filtering (higher quality variants only)
snv_depth_threshold = 15
snv_gq_threshold = 20
# More permissive filtering (include more variants)
snv_depth_threshold = 5
snv_gq_threshold = 5Note: These thresholds only affect the variants shown in the Markdown PDF reports. The raw VCF files contain all called variants regardless of these filters.
Choose your preferred container engine and run the unified setup script:
# For Docker
./setup_pipeline.sh docker
# For Singularity/Apptainer
./setup_pipeline.sh singularityThe setup script handles Java, Nextflow, reference files, and container images in one step.
You can specify a custom temporary work directory using the -w flag. This is useful for:
- Managing disk space on different storage locations
- Avoiding permission issues
- Organizing temporary files
Example:
# Docker
./run_pipeline_docker.sh --run_mode_annotation tertp -w /path/to/your/work/dir
# Singularity/Apptainer
./run_pipeline_singularity.sh --run_mode_annotation tertp -w /home/chbope/extension/trash/tmpNote: The -w flag sets Nextflow's work directory where temporary files and intermediate results are stored during pipeline execution. By default nextflow create a folder work in the working directory.
The pipeline includes smart_sample_monitor_v2.sh for automated monitoring and processing of Oxford Nanopore sequencing runs. This intelligent script continuously monitors sample directories and automatically triggers the pipeline when sequencing completes.
Monitoring & Execution:
- Real-time Monitoring: Watches for
final_summary_*_*_*.txtfiles indicating completed sequencing - Automatic Pipeline Triggering: Starts processing immediately when samples are ready
- Sequential Processing: Processes one sample at a time, queuing others
- Markdown Report Validation: Verifies successful completion before marking as done
Version 2 Enhancements:
- CLI Data Directory Override:
--data-dirtakes precedence overmergebam.config - Resume Control: Disabled by default for fresh runs; use
-rto enable caching - Symlink Resolution: Works correctly when installed as global command
- Portable Execution: Automatically finds pipeline directory from any location
- Sample IDs File: Hardcoded to
/data/routine_diana/sample_ids_bam.txt
# Run from pipeline directory with default config (auto-detects Singularity or Docker)
./smart_sample_monitor_v2.sh
# Monitor specific data directory (overrides config)
./smart_sample_monitor_v2.sh -d /data/WGS_27102025
# Enable resume for cached results
./smart_sample_monitor_v2.sh -d /data/WGS_27102025 -r
# Verbose logging
./smart_sample_monitor_v2.sh -d /data/WGS_27102025 -v
# Combination: resume + verbose
./smart_sample_monitor_v2.sh -d /data/WGS_27102025 -r -v
# Force Docker (useful when both Docker and Singularity are available)
./smart_sample_monitor_v2.sh --docker -d /data/WGS_27102025
# Force Singularity/Apptainer
./smart_sample_monitor_v2.sh --singularity -d /data/WGS_27102025
# Explicit engine flag (equivalent to --docker / --singularity)
./smart_sample_monitor_v2.sh -e docker -d /data/WGS_27102025 -r -v
./smart_sample_monitor_v2.sh -e singularity -d /data/WGS_27102025 -r -vInstall the monitor as a global command accessible from any directory:
User-level installation (Recommended - No sudo required):
# Create user bin directory and symbolic link
mkdir -p ~/bin
ln -sf /data/routine_diana/Diana/smart_sample_monitor_v2.sh ~/bin/smart_sample_monitor
# Add ~/bin to PATH (run once)
cat >> ~/.bashrc << 'EOF'
# Add user's bin directory to PATH
if [ -d "$HOME/bin" ]; then
export PATH="$HOME/bin:$PATH"
fi
EOF
# Activate changes
source ~/.bashrc
# Verify installation
which smart_sample_monitorSystem-wide installation (Requires sudo):
sudo ln -sf /data/routine_diana/Diana/smart_sample_monitor_v2.sh /usr/local/bin/smart_sample_monitorThen use from anywhere:
# Run from any directory
cd /tmp
smart_sample_monitor -d /data/WGS_27102025 -v
# Monitor with custom work directory
smart_sample_monitor -d /data/WGS_27102025 -w /data/trash -r
# Force Docker from anywhere
smart_sample_monitor --docker -d /data/WGS_27102025 -v| Option | Long Form | Description | Default |
|---|---|---|---|
-d |
--data-dir |
Base data directory (overrides config) | Auto-detect from config |
-p |
--pipeline |
Pipeline base directory | Auto-detected |
-w |
--workdir |
Nextflow work directory | /data/trash |
-c |
--config |
Config file to parse | conf/mergebam.config |
-i |
--interval |
Check interval in seconds | 300 (5 min) |
-t |
--timeout |
Maximum wait time in seconds | 432000 (5 days) |
-e |
--engine |
Container engine: singularity, apptainer, or docker |
Auto-detect |
--docker |
Shorthand for --engine docker |
- | |
--singularity |
Shorthand for --engine singularity |
- | |
-r |
--resume |
Enable Nextflow resume | Disabled |
-v |
--verbose |
Enable verbose logging | Disabled |
-h |
--help |
Show help message | - |
- Initialize: Load sample IDs from
/data/routine_diana/sample_ids_bam.txt - Monitor: Check each sample directory for
final_summary_*_*_*.txt - Queue: Mark ready samples for processing
- Execute: Run
--run_mode_orderfor each sample sequentially - Validate: Check for markdown report generation
- Report: Display final status summary
This script is essential for routine ONT sequencing workflows where:
- Multiple samples complete sequencing at different times
- Immediate processing is desired upon completion
- Manual monitoring would be time-consuming and error-prone
- Consistent processing workflow is required
Instead of manually checking and starting the pipeline for each sample, the monitor automatically detects completion and starts processing immediately, maximizing throughput and reducing manual intervention.
Important: Ensure all paths are correctly configured in conf/mergebam.config:
params.path: Reference data directoryparams.input_dir: Default input directory (can be overridden with-d)params.path_output: Output results directory
See docs/GLOBAL_COMMAND_SETUP.md for detailed installation, troubleshooting, and advanced usage.
- Container engine conflict: Ensure only one container system is enabled
- Missing reference files: Download required external files
- Permission issues: Check container and file permissions
# Check containers
docker images | grep vilhelmmagnuslab # Docker
ls -la containers/*.sif # Singularity
# Test pipeline
./test_pipeline_docker.sh # Docker
./test_pipeline_singularity.sh # Singularity- Documentation:
- DOCKER_SETUP.md - Docker installation and setup
- SINGULARITY_SETUP.md - Singularity/Apptainer setup
- docs/GLOBAL_COMMAND_SETUP.md - Global command installation
- Issues: GitHub Issues
- Contact:
- Christian Domilongo Bope (chbope@ous-hf.no / christianbope@gmail.com)
- Skarphedinn Halldorsson (skahal@ous-hf.no / skabbi@gmail.com)
- Richard Nagymihaly (ricnag@ous-hf.no)
If you use this pipeline in your research, please cite:
Bope CD, Nagymihaly R, Halldorsson S, et al. DIANA: Diagnostic Integrated Analytics for Neoplastic Alterations a long-read whole genome sequencing pipeline for molecular neuropathology. 2026. https://doi.org/10.64898/2026.03.25.714119
This project is licensed under the MIT License - see the LICENSE file for details.
Diagnostic Integrated Analytics for Neoplastic Alterations pipeline (DIANA) is an investigational research tool that has not undergone full clinical validation. Any clinical use or interpretation of its results is entirely at the discretion and responsibility of the treating physician
