Releases: theiagen/public_health_bioinformatics
v4.1.0
This release expands automated quality control to new workflows and includes influenza segment thresholds; standardizes adapter, primer, and read trimming; and updates software versions. Documentation updates and various bug fixes are also implemented.
See more details regarding these changes here!
🚀 Changes to existing workflows
Changes to genomic characterization workflows
All Consensus Assembly Workflows
percent_mapped_readsis correctly calculated
All Illumina workflows
fastpDocker and JSON reports are now available outputs
All Viral Workflows
-
Nextcladedataset tags is updated -
Pangolinis updated to version4.3.4-pdata-1.37 -
IRMAflu aligned reads are now extracted and outputted
All TheiaCoV, all TheiaProk, and TheiaEuk Illumina PE workflows
- QC Check is now case-insensitive and accepts
FastQCread quality control as input
All TheiaCoV, all TheiaProk, Freyja, tbprofiler-tNGS, and TheiaEuk Illumina workflows
- Support for adapter trimming via
Trimmomaticis added
All TheiaProk workflows
Vibecheckis updated
All TheiaViral workflows
CheckVandKrakentoolssoft-fail if de novo assembly quality is low / reads cannot be extracted
All TheiaViral Illumina workflows
- TheiaViral allows for finer read quality control inputs and default to
fastpfor read trimming (breaking change)
Freyja Workflows
Freyjais updated, expands quality control features, and the Freyja Update workflow is removed
TheiaCoV Illumina PE and TheiaCoV ONT
- Segment-based QC Check for influenza is added
TheiaViral Illumina PE
- Support for primer trimming is added
Changes to phylogenetic workflows
All Phylogenetic workflows
- Summarize Data now matches specific names when generating its output table
Snippy_Tree
IQ-TREEbootstrapsinput variable name is nowultrafast_bootstraps
Changes to data import workflows
BaseSpace_Fetch
- More robust project/run ID file matching is implemented
📚 Documentation updates
- All workflow input table default values are synchronized, including embedded defaults
- New SOPs is added
- TheiaViral VSP genomic characterization modules and outputs are given detailed explanations
- Freyja Workflow Series diagram is updated
What's Changed
- [PHB] Set defaults to automatically propagate to I/O tables by @xonq in #960
- [Assembly_Metrics] Correctly calculate percent_mapped_reads by @xonq in #967
- [Docs] Expand VSP output descriptions by @xonq in #975
- [IQ-TREE] Specificity of Parameters by @awh082834 in #979
- [TheiaViral | Read_QC_Trim_PE] Simplify TheiaViral and Theia* Illumina PE trimming by @xonq in #970
- [TheiaProk | TheiaEuk | TheiaCoV] Enable Illumina QC Check Table FASTQC read number input by @xonq in #976
- [TheiaViral] Enable Krakentools + CheckV soft-fail by @xonq in #964
- [Documentation] Update 1 SOP by @nehavm456 in #978
- [Documentation] Updating 1 SOP by @nehavm456 in #973
- [Summarize_Data] Ensure specific string matching by @xonq in #977
- [Freyja] Update to Freyja2 and more additions to freyja by @Michal-Babins in #961
- [FastP] Expose fastp docker and json by @xonq in #989
- [TheiaViral | Trimmomatic] Support for Adapter and Primer Trimming by @MrTheronJ in #969
- [Vibecheck] Fix subsampling arg declaration by @xonq in #982
- [Organism Parameters] Update nextclade dataset tags and pangolin docker version by @Michal-Babins in #993
- [Documentation] Update SOP entries for TheiaProk Illumina PE v3 and v4 by @brunatodani in #991
- [Documentation] Update BaseSpace_Fetch_PHB SOP to v4 by @cimendes in #994
- [Basespace_Fetch] Extending grep -E to project ID track by @awh082834 in #981
- [Flu Track] Expose and deinterleave IRMA aligned reads by @awh082834 in #990
- [TheiaViral_Panel] Turn Nextclade Outputs Generic by @awh082834 in #997
- [QC_Check | Flu_Track] Refactor and implement segment QC-check by @xonq in #980
- [bbduk] Add pre-alignment primer trimming by @MrTheronJ in #998
- [v4.1.0] Release preparation by @xonq in #995
- [Assembly Stats] Update samtools by @xonq in #1000
New Contributors
- @nehavm456 made their first contribution in #978
- @brunatodani made their first contribution in #991
Full Changelog: v4.0.0...v4.1.0
v4.0.0
Public Health Bioinformatics v4.0.0 Major Release Notes
This release adds three new workflows, reworks the organism-specific characterization logic in TheiaCoV and TheiaViral, and makes significant improvements to many workflows. Documentation updates and various bug fixes have also been implemented.
There are several breaking changes in this release that prevent backwards compatibility. We’ve marked these items in the release notes with "breaking change" in the header. To know about how to migrate to this release, please see our migration guide: Migration to PHB v4.
See more details regarding each of these changes here!
🆕 New Workflows
- TheiaViral_Panel_PHB
- TheiaViral_Panel is a workflow that incorporates the assembly approach of TheiaViral_Illumina_PE into a panel-compatible format. By using a set of taxon IDs, reads that are specific to each included taxon are extracted for attempted genome assembly and any applicable viral characterization.
- Import this workflow from Dockstore.
- PhyloCompare_PHB
- PhyloCompare generates cophylogeny plots that visualize the differences in two phylogenetic trees’ branching orders and tip arrangements (topology). PhyloCompare includes an additional quantitative validation module, which can validate that two phylogenies have the same topology using distance metrics.
- Import this workflow from Dockstore.
- ONT_Barcode_Concatenation_PHB
- ONT read data sometimes requires concatenation by barcode. This workflow enables easy concatenation of your read data and adds it to a new or existing Terra table.
- Import this workflow from Dockstore.
🚀 Changes to existing workflows
- All workflows that characterize viral pathogens
morgana_magic, a new subworkflow, now controls all viral characterization logic (breaking change)
- All TheiaCoV Workflows
- Nextclade was updated to version
v3.16.0 - Nextclade dataset tags have been updated
- Pangolin was updated to version
4.3.3-pdata-1.36 - VADR was updated to version
1.6.4 - VADR now supports measles, mumps, and rubella
- IRMA and iVar now summarize minor alleles for each influenza segment
- Additional read quality score metrics have been added to the TheiaCoV workflows
- Nextclade was updated to version
- TheiaCoV_FASTA
- Influenza characterization is now supported
- The
qc_check_phbmodule was renamed toqc_check_taskto match other workflows (breaking change)
- All TheiaEuk workflows and Cauris_CladeTyper
- The C. auris CladeTyper tool now includes the Clade VI reference.
- The Clade I reference has been updated to use a complete genome.
- All TheiaEuk and TheiaProk workflows
- Read Screen Handles Cryptic Errors Better
- All TheiaProk workflows
- AMRFinderPlus gene outputs are now alphabetized
- The database of AMRFinderPlus was updated to version
2025-07-16.1 - The Bakta
proteinsinput parameter was corrected to be aFiletype variable SeqSero2has been deprecated in favor ofSeqSero2S(breaking change)- ECTyper has been updated to
2.0.0 - ResFinder now has additional and updated outputs
- All TheiaProk workflows and Gambit_Query
- The GAMBIT prokaryotic database was updated to version
v2.1.0
- The GAMBIT prokaryotic database was updated to version
- TheiaProk_Illumina_PE , TheiaProk_ONT, and TBProfiler_tNGS
- TBProfiler VCF output is now appropriately being captured
- TBProfiler database branches can now be specified (breaking change)
- tbp-parser
min_depthis now explicitly set to 10 - tbp-parser coverage calculations are now correct when
tngs_datais set totrue
- TBProfiler_tNGS
- The Trimmomatic
bases_to_cropdefault value has been removed (breaking change) - The Trimmomatic module is now optional
- Clockwork read decontamination is now available as an optional module
- Read statistics are now generated with
fastq_scan
- The Trimmomatic
- All TheiaViral workflows
- TheiaViral now incorporates genomic characterization modules for extended range of pathogens
- Hosting of internally versioned databases and taxonomy
- Several inputs have changed location (breaking change)
- All ONT workflows
- The
read_qc_trimmodule was renamed toread_QC_trimto match other workflows (breaking change)
- The
- Samples_to_Ref_Tree is now Nextclade_Batch
Samples_to_Ref_Treehas been renamed toNextclade_Batchand now has updated error handling (breaking change)
- Augur
- Augur and Augur_Prep have been updated to
v35.1.0and revamped to improve performance (breaking change)
- Augur and Augur_Prep have been updated to
- Core_Gene_SNP
snp-sitesis now used in core tree generation
- Mercury_Prep_N_Batch
- Mercury was updated to version
1.1.3 - BankIt FASTA and metadata files are now workflow outputs
- Mercury was updated to version
- Terra_2_NCBI
- Read files are now renamed to their corresponding
library_IDfield
- Read files are now renamed to their corresponding
- BaseSpace_Fetch
- Discrepant separators (
"_"vs"-") between sample names and BaseSpace entities are now able to be handled - Only intended samples are now pulled from BaseSpace
- Discrepant separators (
📚 Documentation updates
- All workflow input and output tables were synced and are now completely up-to-date
- Dockstore links have been added to the “Quick Facts” section for every workflow for easier import
- New SOPs were added
- The runtime section in our code contribution guide now specifies
diskanddisks(Thanks, Ash O’Farrel!) - TheiaCoV documentation has been overhauled and reorganized
- The TheiaCoV_Illumina and TheiaCoV_ONT workflow diagrams have been updated
- TheiaProk documentation has been overhauled and reorganized
- The TheiaProk_ONT workflow diagram has been updated
- TheiaValidate documentation was updated
- Typos have been eliminated, and in many places, clarity was finally restored
What's Changed
- [PhyloCompare] Create phylogenetic comparison and validation workflow by @xonq in #771
- [TheiaProk] Add theiaprok_ont diagram by @cimendes in #881
- [TBProfiler] Capture VCF Output by @MrTheronJ in #884
- [Documentation - TheiaCoV] update theiacov diagram by @cimendes in #888
- [PhyloCompare] Add cophylogeny plot generation module, update docs, and versions by @xonq in #889
- [PhyloCompare | TheiaViral] Documentation update by @xonq in #890
- [TheiaCov_FASTA] Add support for Influenza by @MrTheronJ in #872
- [Documentation] Add Dockstore links to Quick Facts by @sage-wright in #894
- [Samples_to_Ref_Tree -> Nextclade_Batch] Identify legacy reference trees by @xonq in #887
- [AMRFinderPlus] Alphabetize gene outputs for AMRFinderPlus string outputs by @awh082834 in #897
- [Documentation] Improve TheiaCoV Documentation by @sage-wright in #896
- [TheiaViral] Implement genomic characterization for other viruses by @xonq in #893
- [AMRFinderPlus] Update to database version 2025-07-16.1 by @awh082834 in #902
- [Mercury_Prep_N_Batch] Updating Mercury Version; Exposing Bankit files and changing metadata to a TSV by @awh082834 in #904
- [TheiaEuk | CladeTyper] Add Clade VI reference and implement CladeTyper thresholding by @xonq in #871
- [Nextclade] Update Nextclade to v3.16.0 and update dataset tags by @awh082834 in #907
- [VADR] Refactor and Add Support for Additional Viruses by @MrTheronJ in #882
- [Documentation] Sync All Inputs and Outputs by @MrTheronJ in #906
- [ECTyper] Update to 2.0.0 with added outputs and documentation updates by @awh082834 in #847
- [Documentation] Fix formatting in TheiaValidate and link typo in digger_denovo by @theiadeb in #912
- [Documentation] TheiaProk task overhaul by @sage-wright in #913
- [ONT_Barcode_Concatenation] New workflow to concatenate ONT barcodes by @sage-wright in #900
- [Documentation] TheiaCoV assembly descriptions and other fixes by @sage-wright in #916
- [Bakta] Change Boolean Input to Correct File Designation by @awh082834 in https://github.com/theiage...
v3.1.1
Public Health Bioinformatics v3.1.1 Patch Release Notes
🩹 This patch release reverts ts_mlst default behavior to pre-v3.1.0 status, enables specific use-case ts_mlst input options, alongside other minor changes
Find our full release notes here!
Find our documentation here!
Bug Fixes
- This patch release resolves unintended typing of E. coli as Aeromonas and other organisms with similar alleles. When an E. coli sample was passed, alleles of Aeromonas could be called before identifying all of the alleles associated with E. coli, which can cause a mischaracterization as Aeromonas. Users can now use
mlst_scheme_overrideto toggle on/off the exclusion of problematic allele sets with E. coli samples. By default, this behavior is turned off (”false”) because it is targeted for a specific use-case. We generally recommend users do not enable this option because it will overwrite the default sets excluded byts_mlst. Please reach out if you would like to discuss this option. - Reverted
ts_mlstto not run both schemes for 3.1.0 in E. coli and A. baumannii by default*.* Previous release (v3.1.0) implemented a change tots_mlstthat by default enabled both schemes associated with E. coli and A. baumannii. While this behavior may still be desirable, we inadvertently introduced it in a way that users could not "opt in" to this functionality. Users can now decide if they wish to see single or double MLST scheme outputs in a single output column, using themlst_run_secondary_schemeboolean input. By default, this behavior is turned off (”false”). - Fixes Mercury error where
skip_ncbiwastruefor Mpox and GISAID metadata would not populate.
Other Updates
- The TheiaProk_Illumina Diagram is updated to include
digger_denovosubmodules. - Samples_to_Ref_Tree is updated to include multiple genomes in a Nextclade phylogeny by accepting multiple samples as an input array.
What's Changed
- [Documentation] update theiaprok_ilmn diagram by @cimendes in #875
- [TS_MLST] Refactor of E coli, A baumannii secondary scheme selection and scheme override by @awh082834 in #873
- [Samples_to_Ref_tree] Multi-sample genome input by @xonq in #868
- [Mercury] update mercury version by @xonq in #879
- [TS_MLST] Update docs to include more info regarding override usage by @awh082834 in #880
Full Changelog: v3.1.0...v3.1.1
v3.1.0
Public Health Bioinformatics v3.1.0 Minor Release Notes
This minor release adds four new workflows - TheiaViral_Illumina_PE, TheiaViral_ONT, TheiaEuk_ONT, and Terra_2_ENA. Documentation updates and various bug fixes have also been implemented.
Full release notes can be found here!
Find our documentation here!
🆕 New workflows
-
TheiaViral_ONT & TheiaViral_Illumina_PE
- These workflows generate de novo and consensus viral genome assemblies for either Oxford Nanopore (ONT) or Illumina paired-end (PE) sequencing data. TheiaViral is generalized to accommodate diverse and segmented viral lineages, including: hantavirus, norovirus, rabies, influenza, HIV, herpes simplex, ebola, hepatitis C, etc.
- TheiaViral also enables rabies genotyping by introducing a new Lyssavirus rabies Nextclade dataset.
- TheiaViral implements de novo assembly and dynamic reference genome selection to enable consensus genome assembly. The preexisting TheiaCoV workflow is a consensus assembly and characterization pipeline specialized for a subset of viral lineages with static reference genomes.
- Import TheiaViral_Illumina_PE from Dockstore
- Import TheiaViral_ONT from Dockstore
-
- TheiaEuk_ONT is an Oxford Nanopore Technologies (ONT) genome assembly and fungal genome characterization pipeline. TheiaEuk_ONT is currently intended for haploid fungal genomes. The TheiaEuk workflows support the de novo assembly, quality assessment, and characterization of fungal genomes. This version has been updated to accept basecalled Oxford Nanopore Technologies (ONT) reads as the primary input.
- Import TheiaEuk_ONT from Dockstore
-
💡 We are confident in the functionality of this workflow, but were not able to source any partners for final UAT, as many sites do not actively submit to ENA. As such, if you are interested in using this workflow, we'd greatly appreciate working with you for feedback.
- Introduces a standalone workflow for submitting data to the European Nucleotide Archive. Currently supports prokaryotic and viral sample types.
- The workflow is structured in three phases: preparation, registration, and submission.
- The workflow begins by downloading the user's Terra data table and validates the contents against ENA requirements. Every submission to ENA requires mandatory fields for both sample metadata and raw read data.
- Before using the workflow users must register a study with ENA to obtain a study accession number.
- Users should also review the documentation to determine which fields are required for their specific
sample_type(currently supporting prokaryotic and viral samples) and add those fields to their Terra data table or input TSV file. - Import Terra_2_ENA from Dockstore
🚀 Changes to existing workflows
-
All TheiaCoV Workflows
- Added support for Measles virus analysis using Nextclade
- IRMA now handles cases when an assembly cannot be created
- Enable Nextclade for H5N1 samples with GenoFLU genotype D1.1
- Updated
Pangolindocker image
-
All TheiaProk Workflows
- Addition of
task_arln_stats.wdlto include missingARLNstat checks - Allow for
task_mlstto handle mischaracterization for certain E. coli schemes - Abort ANI calculations if a table is only populated with 0 values
- Updated
stxtyperdocker image - Updated
task_kmerfinderdocumentation - Updated
AMRFinderdocker image - Updated
GAMBITProkaryotic DB - Implemented updates and changes to correspond to Phoenix / ARLN statistics criteria
- Addition of
-
TheiaEuk_Illumina_PE
- Enabled
kraken2read classification
- Enabled
-
TheiaEuk_Illumina_PE, TheiaProk_Illumina_PE, TheiaProk_Illumina_SE
- Deprecated
shovilland replaced withdigger_denovosubworkflow
- Deprecated
-
TheiaProk_Illumina_PE, TheiaProk_Illumina_SE, TheiaProk_ONT
- Handled edge case where no reads map in
task_shigatyper
- Handled edge case where no reads map in
-
TheiaProk_Illumina_PE
- Restricted
Vibecheckto run only if sample is classified as O1 Vibrio.
- Restricted
-
All ONT workflows that run
Kraken2- Implemented memory efficient parsing and exposed extra memory parameters
-
Samples_to_Ref_Tree & All TheiaCov workflows
- Created functionality to accept custom user-provided Nextclade datasets
- Updated Nextclade dataset tags and docker image
-
BaseSpace_Fetch
- Set api_server to default address
-
TBProfiler_tNGS_PHB
- Fixed percent coverage and average depth calculations for tNGS data
- Updated
tbp_parserdocker image
-
NCBI-AMRFinderPlus_PHB
- Updated
AMRFinderdocker image
- Updated
-
Mercury_Prep_N_Batch_PHB
- Made
organismcase-insensitive, decoupledorganisminput from metadata population, and createdmetadata_organismas argument for populating this field
- Made
-
GAMBIT_Query_PHB
- Update GAMBIT Prokaryotic DB
📚 Documentation Updates
- The Freyja page was improved
- The GAMBIT_Query_PHB page was updated and improved
- The TheiaProk Workflow Series page was improved
- The TheiaEuk Workflow Series page was improved
- Added SOPs
- Update workflow relationships diagram
What's Changed
- [Documentation] Update Freyja documentation page by @ss43 in #816
- [Documentation] create modular documentation functionality and consolidate input/output tables by @sage-wright in #824
- [ARLN] Addition of task_arln_stats.wdl to include missing ARLN stat checks by @awh082834 in #821
- [MLST] Allow for task_mlst to handle mischaracterization for certain ecoli schemes by @awh082834 in #819
- [kraken2_parse_classified] Implement memory efficient parsing and expose parameters by @MrTheronJ in #820
- [Shovill] deprecate shovill and introduce digger_denovo subworkflow by @Michal-Babins in #823
- [MUMmer ANI] Abort ANI extraction if a table is only populated with 0 values by @xonq in #830
- [Basespace_Fetch] Set api_server to default address by @awh082834 in #831
- [Documentation] Widespread macro implementation for the Quick Facts section by @sage-wright in #832
- [KmerFinder, TheiaEuk] Docs update to include links and more in depth description by @awh082834 in #841
- [STXtyper] Update to container version 1.0.42 by @awh082834 in #840
- [Nextclade] Enable custom datasets by @xonq in #833
- [Documentation] Add SOPs and update the
render_tsv_tablemacro implementation by @sage-wright in #838 - [Documentation] install pandas so that documentation can build by @sage-wright in #843
- [Requester Pays] transition all resources to the theiagen-public-resources-rp bucket by @sage-wright in #839
- [TheiaProk] Restrict Vibecheck to run on O1 V. cholerae only by @sage-wright in #842
- [tbp_parser] Fixing percent coverage calculation for tNGS analysis by @MrTheronJ in #844
- [Documentation] Update workflow relationships diagram, fix a few broken links, various tidying by @sage-wright in #849
- [Nextclade] Version and DS tag update by @awh082834 in #850
- [TheiaEuk] TheiaEuk ONT Workflow by @Michal-Babins in #644
- [AMRFinder] AMRFinder update to v4.0.23 and database v2025-06-03.1 by @awh082834 in #854
- [Terra_2_ENA] Workflow for submitting data to the European Nucleotide Archive by @MrTheronJ in #826
- [TheiaProk] Handle edge case where no reads map in task_shigatyper by @MrTheronJ in #851
- [Nextclade & Measles] Adding Measles functionality to TheiaCov by @awh082834 in h...
v3.0.1
Public Health Bioinformatics v3.0.1 Patch Release Notes
🩹 This patch release fixes several bugs that were identified
Find our full release notes here!
Find our documentation here!
Bug Fixes
- A bug in TheiaProk_ONT assembly polishing with short reads has been fixed.
- A bug in Dorado_Basecalling (when the number of chunks equaled the number of POD5 files) has been fixed.
- Ash O’Farrell fixed a bug in the the
tbp-parsertask that prevents division by zero when a sample has extremely low coverage. Thanks, Ash! - The
tbp_parser_min_frequencyandtbp_parser_min_percent_coveragevariables are appropriately Floats and not Integers. You can provide as many decimals as you want now (but only one per value, in accordance with the rules of math).
Future Proofing
- We’ve made a few changes to our workflows to permit compatibility with Terra changing to use the Google Batch API (see the details regarding these changes from Terra here).
- We have ended developmental support for the Nullarbor_PHB workflow. This workflow will be available for PHB v3.0.0 and below in perpetuity but will no longer be updated.
Other Updates
- The AMR_Search task has been added to the TheiaProk and TheiaEuk workflows! To run AMR_Search, set the new input parameter
run_amr_searchtoTrue. - The following Docker images have been updated:
- AMRFinderplus →
us-docker.pkg.dev/general-theiagen/staphb/ncbi-amrfinderplus:4.0.19-2024-12-18.1 - MLST →
us-docker.pkg.dev/general-theiagen/staphb/mlst:2.23.0-2024-12-31 - tbp-parser →
us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:2.4.5 - Pangolin →
us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.33
- AMRFinderplus →
- Various documentation updates have been made and broken links fixed.
What's Changed
- [Merlin_Magic] Integrate AMR_Search_PHB into Merlin_Magic by @awh082834 in #797
- Handle edge case of extremely low coverage by @aofarrel in #798
- [AMRFinderPlus] Docker update to ncbi-amrfinderplus:4.0.19-2024-12-18.1 by @awh082834 in #799
- [MLST] update docker container by @xonq in #800
- [Documentation] Various updates by @sage-wright in #807
- [tbp-parser] update version by @MrTheronJ in #803
- Path Correction for Google Batch API Changes by @sage-wright in #806
- [VADR] Update mem for gcp batch by @Michal-Babins in #808
- [Pangolin] Update to 4.31-pdata-1.33 by @awh082834 in #811
- [AMR_Search & Merlin_Magic] Address style guide issues and formatting by @awh082834 in #813
- [v3.0.1] Update versioning by @sage-wright in #810
- [TheiaProk] update export_taxon_table and documentation by @sage-wright in #814
Full Changelog: v3.0.0...v3.0.1
v3.0.0
Public Health Bioinformatics v3.0.0 Major Release
This major release adds four new workflows, updates the assembly algorithm for TheiaProk_ONT, and makes significant improvements to many workflows. Documentation updates and various bug fixes have also been implemented.
Full release notes can be found here!
Find our documentation here!
🆕 New workflows:
-
- This workflow processes ONT sequencing data to identify genetic variations compared to a reference genome. Clair3 is a small variant caller for long-reads and is now the preferred option in the latest version of the
Articpipeline. This update allows for compatibility with newer base calling models and changes the default model tor1041_e82_400bps_hac_v420. Other supported Clair3 models can be found here - Import this workflow from Dockstore
- This workflow processes ONT sequencing data to identify genetic variations compared to a reference genome. Clair3 is a small variant caller for long-reads and is now the preferred option in the latest version of the
-
- This workflow mimics kSNP3 but uses the latest version — called kSNP4. Algorithmically, there are no significant changes between kSNP3 and kSNP4. The main benefit of upgrading to kSNP4 is that it is more optimized and reduces overall runtime. Since the software changed names, we made a new workflow so the names could match.
- Import this workflow from Dockstore
-
- This is a standalone workflow for PathogenWatch AMR-search; future plans include adding this workflow to TheiaProk. This encapsulates the functionality of the AMR portion of PathogenWatch allowing users to retain the functionality without housing their data with an external source. It utilizes the PAARSNP to infer resistance phenotypes. Currently functionality is limited to a small number of species.
- Import this workflow from Dockstore
-
- This workflow performs GPU-accelerated basecalling on Oxford Nanopore POD5 files. The user uploads all of the raw POD5 files to their workspace (or alternative Google Cloud Storage bucket location) where the workflow will identify, basecall, demultiplex,, and optionally trim the POD5 files into FASTQs. The resulting FASTQ files will be added to an existing or new Terra table with one row for each barcode.
- Import this workflow from Dockstore
🚀 Changes to existing workflows:
- All Genomic Characterization Workflows
- Language has been standardized
skip_screenactually skips the read screen task- All
read_screenchecks are run and the results are output to a file
- All ONT Workflows
- Additional inputs for
read_qc_trimare now available for modification
- Additional inputs for
- All TheiaCoV Workflows
- Nextclade has been updated to v3.10.2
- Default Nextclade dataset tags have been updated
- Pangolin has been updated to
4.3.1-pdata-1.32
- TheiaCoV_Illumina_PE and TheiaCoV_ONT: Influenza Characterization
- IRMA has been updated to v1.2.0 and new inputs and outputs are now available
- MIRA standards are now used for default values in IRMA
- GenoFLU has been updated to v1.06, a new input is available, and only runs when the clade is 2.3.4.4b.
- Custom databases can now be provided to Nextclade for B3.13 H5N1 samples
- Nextclade now runs on H5N1 samples if a custom dataset file is provided
- TheiaCoV_Illumina_PE and TheiaCoV_Illumina_SE
- iVar variants no longer fails due to lack of variant identification
- TheiaEuk_Illumina_PE
- The GAMBIT fungal database has been updated to v1.0.0
- RASUSA has been updated to v2.1.0
- A few C. auris cladetyping outputs were renamed
- All TheiaProk Workflows
export_taxon_tableinputs removed from Terra- Bakta database customization is now available
- hicap optional parameters are now available for modification (for Haemophilus influenzae)
- sonneityping no longer fails due to sonnei typing disagreement (for Shigella sonnei)
- StxTyper has been updated to v1.0.40 and new outputs are available (for Shigella spp., but can be run on any sample)
- SeqSero2 has been updated to v2.1.3.1 and new outputs are available (for Salmonella spp.)
- SISTR has been updated to v1.1.3 and new outputs are available (for Salmonella spp.)
- TBProfiler has been updated to v6.6.3 (for Mycobacterium tuberculosis)
- tbp-parser has been updated to v2.4.4 and a new input is available (for \Mycobacterium tuberculosis)
- TheiaProk_Illumina_PE
- vibecheck added for Vibrio cholerae characterization
- TheiaProk_ONT
- The assembly process was transitioned away from dragonflye in favor of a non-wrapped individual tool implementation
- The plasmid detection tool, Tiptoft, was deprecated
- RASUSA has been updated to v2.1.0
- All Freyja Workflows
- Freyja has been updated to v1.5.3
- A reference GFF can now be provided to Freyja_FASTQ and the
primer_bedinput is now optional - Additional output columns have been created for Freyja_FASTQ from the results file
- The pathogen flag has been added to all Freyja commands in Freyja_FASTQ
- All Phylogenetic Workflows
- The
reorder_matrixtask output file suffix was changed to be more general
- The
- Augur_PHB
- The augur_clades task no longer runs if an augur_clades_tsv is provided
- Snippy_Streamline and Snippy_Streamline_FASTA
- When
include_gbffis true, the GBFF file is now used as the reference - Viral genomes can be retrieved (for a reference) by setting
use_ncbi_virusto true - The centroid genome can be used as the reference by setting
use_centroid_as_referenceto true
- When
- Kraken_PE, Kraken_SE, and TheiaMeta_Illumina_PE
- krona was updated to be compatible with viral data
- Mercury_Prep_N_Batch
- Some metadata columns can now be populated from workflow inputs
- Terra_2_NCBI
- A custom mapping file for tables with different column headers is available
- Assembly_Fetch_PHB
- Viral genomes can be retrieved by setting
use_ncbi_virusto true.
- Viral genomes can be retrieved by setting
- SRA_Fetch
- Workflow versioning information is now output
- Cauris_Cladetyper
- Outputs were renamed
- RASUSA_PHB
- RASUSA has been updated to v2.1.0
- TBProfiler_tNGS_PHB
- The
bases_to_cropvariable default has been set to 0. - TBProfiler has been updated to v6.6.3
- tbp-parser has been updated to v2.4.4 and a new input is available
- The
- TheiaValidate_PHB
- TheiaValidate has been updated to v1.1.2
📖 Documentation updates
- Formalized the philosophy of workflow failures - see “Getting Started”
- Created guides for GAMBIT, GAMBIT database creation, phylogenetics, and custom organisms through TheiaCov
- A SARS-CoV-2 wastewater metadata formatter was added to Terra_2_NCBI
- Fixed input table in the Augur_PHB documentation
- Added documentation for the qc_check task in TheiaCoV
- Many dead links have been fixed
- The TheiaMeta documentation page was improved
- The Freyja documentation page was improved
- The Snippy_Streamline and Snippy_Streamline_FASTA pages were improved
- The data export workflows were improved and standardized — these include Concatenate_Column_Content, Transfer_Column_Content, and Zip_Column_Content
- The data import workflows were improved and standardized - these include Assembly_Fetch, BaseSpace_Fetch, Create_Terra_Table, and SRA_Fetch.
- Sources added to the influenza antiviral resistance detection module
- The new workflow documentation template was improved
- We’ve fixed our formatting in multiple places and standardized our language in others.
What's Changed
- [Documentation] Fix overflow in Augur documentation by @sage-wright in #706
- [GenoFLU] update docker to v1.05 by @sage-wright in #704
- [Documentation] add qc check documentation to TheiaCoV by @sage-wright in #701
- [Cauris_Cladetyper] Various improvements and removal of old TheiaCauris references by @sage-wright in #700
- [read_QC_trim_ONT] add additional inputs for user modification by @sage-wright in #702
- [Read_Screen] skip_screen actually skips the screen by @sage-wright in #699
- [SRR_Fetch] update doc links by @fraser-combe in #719
- [Documentation] SC2 Wastewater metadata formatter by @frankambrosio3 in #721
- [Documentation] Adding philosophy of workflow failures page by @sage-wright in https://github.com/theiagen/p...
v2.3.0
Public Health Bioinformatics v2.3.0 Minor Release
This minor release adds two new workflows, Fetch_SRR_Accession_PHB and Concatenate_Illumina_Lanes_PHB, and makes significant improvements to the TheiaCoV, TheiaEuk, TheiaProk, and TheiaMeta workflow series. Documentation updates and various bug fixes have also been implemented.
Full release notes can be found here!
Find our documentation here!
🆕 New workflows
-
Concatenate_Illumina_Lanes_PHB
- Some Illumina sequencing platforms produce FASTQ files split across multiple lanes for a single sample. This workflow combines multi-lane FASTQ files from Illumina sequencing runs into a single read1 and read2 file per sample. This workflow is ideal for Illumina sequencing outputs where data from multiple lanes must be combined to proceed with analysis workflows such as assembly or variant calling as it ensures that downstream workflows receive consolidated FASTQ files
- This workflow is designed to run automatically at the start of the TheiaProk workflow if multi-lane FASTQ files are provided (e.g.,
read1_lane2.fastq.gzandread2_lane2.fastq.gz) - Import this workflow from Dockstore
-
Fetch_SRR_Accession_PHB
- This workflow will retrieve any Sequence Read Archive (SRA) accessions (SRR) associated with a given sample accession, such as a BioSample ID (e.g., "SAMN00000000") or SRA Experiment ID (e.g., "SRX000000").
- This process utilizes the
fastq-dltool to fetch metadata from SRA and outputs the corresponding SRR accession(s). - If multiple SRR accessions are linked to a single sample, the workflow will output them as a comma-separated list.
- This process utilizes the
- This workflow is particularly useful for retrieving SRR accessions a few days after running Terra_2_NCBI workflows.
- Import this workflow from Dockstore
- This workflow will retrieve any Sequence Read Archive (SRA) accessions (SRR) associated with a given sample accession, such as a BioSample ID (e.g., "SAMN00000000") or SRA Experiment ID (e.g., "SRX000000").
🚀 Changes to existing workflows
-
All Genomic Characterization Workflows
- The read screen is now compatible with Dorado-produced FASTQ files
-
All Illumina Workflows
fastq_scanhas been updated to the latest version
-
All TheiaCoV Workflows
- The percentage of mapped reads is now output in all TheiaCoV workflows (except TheiaCoV_FASTA)
- The default Nextclade dataset tags have been updated for SC2, mpox, flu, RSV-A, and RSV-B
- The default Pangolin docker is now
us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.31 - Kraken2 standalone is now used and databases must be provided.
-
TheiaCoV_Illumina_PE and TheiaCoV_ONT
- Default parameters have been set for H5N1 flu
- IRMA assembled flu segments now in sorted order
-
All TheiaEuk Workflows
- Additional genes for Candida auris are now examined by default in the Snippy_Gene_Query task
- Bug fix to the
snippy_variants_num_variantsoutput column for Cryptococcus neoformans
-
TheiaMeta_Illumina_PE
- MIDAS is now an optional task in TheiaMeta.
-
All TheiaProk Workflows
stxtyperwas added to all TheiaProk workflows
-
TheiaProk_Illumina_PE and TheiaProk_Illumina_SE
- Multi-lane Illumina data can now be used as input natively.
-
TheiaProk_Illumina_PE and TheiaProk_ONT
TBProfilerhas been updated to v6.4.1tbp-parserhas been updated to v2.2.2
-
Augur_PHB
- Versioning information for the tree-building tools is now available
-
All Freyja Workflows
- Freyja now supports non-SARS-CoV-2 organisms natively.
-
Mercury_Prep_N_Batch
- Errors no longer occur when data has been previously transferred
- The correct information is now being provided for GISAID’s
covv_coveragecolumn for ClearLabs data - Failures now fail the task
-
Snippy Workflows
- A new file with QC metrics has been created
- Additional QC metrics are now output
-
Terra_2_NCBI_PHB
- Collection dates will no longer have decimals
📚 Documentation Updates
- Search tables better with table-specific search bars
- Dead links removed
- Generally improved documentation
What's Changed
- [Documentation] Updated Snippy variants output documentation by @fraser-combe in #623
- [TheiaCoV] iVar Consensus Pipefail fix by @Michal-Babins in #629
- [TheiaProk] expose sistr optional param inputs to theiaProk wfs by @fraser-combe in #603
- [Documentation] fix broken links by @sage-wright in #627
- Snippy_Variants: Calculate % reads aligned by @fraser-combe in #616
- [Augur +TheiaCoV] Enable H5N1 flu subtype augur & nextclade by @Michal-Babins in #640
- [TheiaMeta] Midas call in read_QC_trim_pe.wdl workflow and outputs by @fraser-combe in #619
- [TheiaCoV] Reorder flu segments from largest to smallest in irma task by @Michal-Babins in #635
- [Mercury] prevent silent failures by @sage-wright in #648
- Fixed theiacov documentation to specify assembly order by @Michal-Babins in #652
- [TheiaCov & TheiaProk & TheiaEuk] read screen ONT bugfix and improvements by @kapsakcj in #650
- [TheiaCoV ONT and Clearlabs] Update consensus task container to artic:1.2.4-1.12.0 by @cimendes in #636
- [Documentation] Search bar for tables within docs by @fraser-combe in #646
- [TheiaEuk] Additional genes for Snippy_Gene_Query by @sage-wright in #647
- [MerlinMagic] Fixed output for crypto snippy_variants_num_variants by @Michal-Babins in #654
- [Documentation] type error correction theiacov wf by @fraser-combe in #660
- [TheiaProk] Adds stxtyper to merlin_magic and TheiaProk wfs by @kapsakcj in #525
- [Mercury] bump mercury docker to 1.0.9: bugfix for GISAID metadata covv_coverage column by @kapsakcj in #661
- [TheiaCov] wfs add percentage_mapped_reads by @fraser-combe in #641
- [Documentation] Update MIDAS database documentation in TheiaProk by @fraser-combe in #667
- Add Snippy_Variants QC outputs to Snippy_Tree and Snippy_Sreamline workflow outputs by @jrotieno in #592
- [TheiaCoV/TheiaProk/TheiaMeta/TheiaEuk/Freyja_FASTQ]
fastq-scanupdates & improvements. Adding JSON as wf output file by @kapsakcj in #662 - Prevent Silent Errors by @sage-wright in #666
- [Augur] Add augur tree iqtree model type to output by @Michal-Babins in #674
- [Terra2NCBI] Force collection_date to be a string by @cimendes in #658
- [Documentation] Update code contribution guidelines by @fraser-combe in #675
- [Retrieve_SRR_Metadata] New wf to retrieve SRR after Terra2NCBI wf by @fraser-combe in #668
- Documentation Update by @frankambrosio3 in #678
- [Documentation] Various updates by @sage-wright in #680
- [TheiaCoV] Update nextclade dataset tags and pangolin docker version by @Michal-Babins in #679
- [Documentation] update dataset tags by @Michal-Babins in #681
- [TheiaCoV] Split database from Kraken2_TheiaCoV task by @cimendes in #670
- [TheiaCoV] Update nextclade dataset tag for H5N1 to the latest version by @Michal-Babins in #683
- [Freyja] Update freyja to version 1.5.2, expose pathogen flag and minor update to docs by @cimendes in #684
- [Augur] Expose Augur versions by @Michal-Babins in #686
- [TheiaProk] Update default versions for TB-Profiler and tbp-parser by @sage-wright in #673
- v2.3.0 final changes by @sage-wright in #693
- [Concatenate_Illumina_Lanes] Fix bug when single-end only by @sage-wright in https://gith...
v2.2.1
Public Health Bioinformatics v2.2.1 Patch Release Notes
🩹 This patch release fixes the output names for the NCBI-Scrub standalone workflows.
Our documentation has also been migrated to GitHub for easier maintenance.
Full release notes can be found here!
Find our documentation here!
What's Changed
- [Documentation] Transfer all PHB documentation to GitHub by @sage-wright in #605
- [NCBI Scrub Standalone Workflows] Correct output declarations for the number of spots removed by @cimendes in #610
- [v2.2.1] update version tag by @sage-wright in #622
Full Changelog: v2.2.0...v2.2.1
v2.2.0
Public Health Bioinformatics v2.2.0 Minor Release Notes
This minor release adds two new workflows, Create_Terra_Table_PHB and Snippy_Streamline_FASTA_PHB, and makes significant improvements to the TheiaProk, TheiaCoV, TheiaMeta, and Freyja workflow series. Additionally, several bug fixes have been made.
Full release notes can be found here!
Find our documentation here!
🆕 New workflows:
-
- The manual creation of Terra tables can be tedious and error-prone. This workflow will automatically create your Terra data table when provided with the location of the files. It can import assembly, paired-end (Illumina) and single-end (Illumina and Oxford Nanopore) data.
- Import the workflow from Dockstore.
-
- Since Snippy_Variants_PHB is now compatible with assembled sequences as input in FASTA format, we have developed Snippy_Streamline_FASTA, an all-in-one approach to generating a reference-based phylogeny using the Snippy tools, mirroring the Snippy_Streamline_PHB workflow. By default, it runs Snippy_Variants and Snippy_Tree, but will optionally run Assembly_Fetch if a reference genome is not provided.
- Import the workflow from Dockstore.
🚀 Changes to existing workflows:
-
All TheiaProk Workflows
- Genomic characterization with
emmtyperis now enabled for Streptococcus pyogenes. (Thanks, @sam-baird!) - When
call_aniistrue, failures will no longer occur if multiple hits have the same score. - Support for Vibrio parahaemolyticus, Vibrio vulnificus and Enterobacter asburiae was added to the AMRFinderPlus task
- VirulenceFinder now runs on Shigella sonnei samples.
- The Docker containers for AMRFinderPlus, tbp-parser and mlst have been updated:
- AMRFinderPlus:
3.12.8-2024-07-22.1 - tbp-parser:
tbp-parser:1.6.0 - mlst:
2.23.0-2024-08-01
- AMRFinderPlus:
- Genomic characterization can now be skipped by setting the new optional input
perform_characterizationtofalse. - The GAMBIT prokaryotic database has been updated to
v2.0.0-20240628. - Optional inputs are now available for all tasks within the
merlin_magicsubworkflow.
- Genomic characterization with
-
All TheiaCoV Workflows
- GenoFLU has been added for H5N1 influenza typing.
- Additional VADR output files have been exposed:
File? vadr_feature_tbl_passFile? vadr_feature_tbl_failFile? vadr_classification_summary_fileFile? vadr_all_outputs_tar_gz
- Aligned FASTQs no longer contain supplemental/secondary alignments.
-
TheiaCoV_Illumina_PE_PHB and TheiaCoV_ONT_PHB
- Workflow will no longer fail if an assembly cannot be produced. The
assembly_fastacolumn will say "Assembly could not be generated".
- Workflow will no longer fail if an assembly cannot be produced. The
-
TheiaEuk_Illumina_PE_PHB
- TheiaEuk no longer abruptly fails if an organism outside of the expected list of taxa is detected by GAMBIT.
- All optional inputs and docker containers for taxa-specific sub-modules have been exposed.
-
All ONT workflows (TheiaProk and TheiaCoV)
- KMC is no longer used for genome-size prediction. Instead, for TheiaProk, the expected genome length is now set to 5 Mb, which is around 0.7 Mb larger than the average bacterial genome length. For TheiaCoV, species have default genome lengths associated with their organism tag.
-
TheiaCoV and TheiaMeta workflows
- The human read removal tool (HRRT) has been updated to
v2.2.1. For paired-end data, reads are first interleaved to guarantee that no mates are orphaned by this tool.
- The human read removal tool (HRRT) has been updated to
-
All Freyja Workflows
- Freyja has been updated for all workflows to version
1.5.1. - SARS-CoV-2 UShER barcodes file is now a .feather file.
- Freyja_FASTQ_PHB is now compatible with Illumina paired-end, Illumina single-end and Oxford Nanopore data. A new input
onthas been added to control workflow behavior. - The UShER barcodes and lineage files used are now exposed as outputs in Freyja_FASTQ_PHB
- Freyja has been updated for all workflows to version
-
Snippy_Variants_PHB
- In addition to reads, paired-end, and single-end, assemblies are now accepted as input. If Illumina sequencing data is to be used, use the
read1and optionally, theread2, optional inputs to pass the forward and reverse-facing reads respectively, If assembled genomes are to be used, use theassembly_fastainput and omitread1andread2.
- In addition to reads, paired-end, and single-end, assemblies are now accepted as input. If Illumina sequencing data is to be used, use the
-
SRA_Fetch_PHB
- SRA-Lite files are now detected when it's a low-quality file.
-
Augur_PHB
- mpox mutation context has been added to the
auspice_input_jsonoutput which displays the fraction of G->A or C->T.
- mpox mutation context has been added to the
-
GAMBIT_Query_PHB
- The GAMBIT prokaryotic database has been updated to
v2.0.0-20240628.
- The GAMBIT prokaryotic database has been updated to
-
Mercury_Prep_N_Batch_PHB
- Mercury has been moved to its own repository at https://github.com/theiagen/mercury.
- Mercury now processes BioSample & SRA metadata for flu
What's Changed
- [TheiaProk] Add emmtyper task for Streptococcus pyogenes by @sam-baird in #524
- [SRA-Fetch] Detect SRA-Lite when it's low quality file by @cimendes in #512
- Adding the Create_Terra_Table_PHB workflow by @sage-wright in #533
- [Create_Terra_Table] recognize fastq files that end in .fq by @sage-wright in #535
- [TheiaProk - ANI] prevent failures when multiple top hits have the same score by @sage-wright in #532
- [TheiaCoV] Flu: Prevent workflow failures when assembly cannot be produced; generate NanoPlot outputs regardless of assembly success by @sage-wright in #530
- [theiaprok] amrfinderplus: add support for Vibrio parahaemolyticus, Vibrio vulnificus, Enterobacter asburiae. Fix C diff bug by @kapsakcj in #542
- [TheiaCoV] Add GenoFLU for flu whole-genome genotyping by @sage-wright in #540
- [TheiaProk] Merlin_magic subwf bugfix: run virulencefinder on Shigella sonnei by @kapsakcj in #543
- [TheiaCoV and TheiaMeta] Update hrrt (ncbi-scrub) to version 2.2.1 and optimise task by @cimendes in #527
- [TheiaCoV and TheiaMeta - HRRT] Patch bug by removing unneeded awk verification by @cimendes in #550
- Create CODEOWNERS by @AndrewLangvt in #554
- [TheiaProk] Add additional input enabling characterization by @sage-wright in #547
- Updating templates & broken links in the readme by @sage-wright in #555
- [TheiaEuk] Fix bug where String outputs were being passed as File for Snippy_variants by @cimendes in #574
- [TheiaProk] update tbp-parser to latest version by @sage-wright in #576
- [Create_Terra_Table] fix bug, and enable ability for users to provide their own file ending suffixes by @sage-wright in #575
- [theiacov] Add additional vadr output files & tarball; upgrade VADR docker by @kapsakcj in #556
- [ONT] Remove KMC by @sage-wright in #578
- [Create_Terra_Table] fix sample name i...
v2.1.0
Public Health Bioinformatics v2.1.0 Minor Release Notes
This minor release improves the utility and usability of several Oxford Nanopore Technologies’ dedicated workflows for viral and bacterial genomic characterization (TheiaCoV and TheiaProk). Additionally, support for new organisms has been added to several workflows.
Full release notes can be found here!
Find our documentation here!
🚀 Changes to existing workflows:
-
All TheiaProk Workflows
- General Abricate is now available though the
call_abricateandabricate_dboptional inputs. - Abricate specifically for Vibrio cholerae is now available. It launches automatically if the
gambit_predicted_taxonorexpected_taxonis Vibrio cholerae. - A new optional parameter
separate_betalactam_genesis now available that splits AMRFinderPlus beta-lactam hits into new columns. - The
call_midasoptional input is now set to false by default.
- General Abricate is now available though the
-
TheiaProk_Illumina_PE
- New read quality-control outputs have been added:
r1_mean_q_clean,r2_mean_q_clean,r1_mean_readlength_cleanandr2_mean_readlength_clean.
- New read quality-control outputs have been added:
-
TheiaProk_ONT
- New read quality-control outputs have been added:
nanoplot_r1_median_readlength_raw,nanoplot_r1_stdev_readlength_raw,nanoplot_r1_n50_raw,nanoplot_r1_median_q_raw,nanoplot_r1_est_coverage_raw,nanoplot_r1_median_readlength_clean,nanoplot_r1_stdev_readlength_clean,nanoplot_r1_n50_clean,nanoplot_r1_median_q_cleanandnanoplot_r1_est_coverage_clean. - Kraken2 is now available through the
call_krakenandkraken_dboptional inputs. - A maximum genome size of 10Mbp is set to prevent excessive runtimes.
- New read quality-control outputs have been added:
-
All TheiaCoV Workflows
- RSV-A and RSV-B are now able to be analyzed with the TheiaCoV workflows. Nextclade characterization and Kraken taxonomic analysis will now be run on RSV samples.
- The following default organisms now have the following Nextclade dataset tags:
Organism New default Nextclade dataset tag SARS-CoV-2 "2024-06-13--23-42-47Z" mpox "2024-04-19--07-50-39Z" Flu H1N1 HA "2024-04-19--07-50-39Z" Flu H1N1 NA "2024-04-19--07-50-39Z" Flu H3N2 HA "2024-04-19--07-50-39Z" Flu H3N2 NA "2024-04-19--07-50-39Z" Flu Victoria HA "2024-04-19--07-50-39Z" Flu Victoria NA "2024-04-19--07-50-39Z"
-
TheiaProk_ONT
- New read quality-control outputs have been added:
nanoplot_r1_median_readlength_raw,nanoplot_r1_stdev_readlength_raw,nanoplot_r1_n50_raw,nanoplot_r1_median_q_raw,nanoplot_r1_est_coverage_raw,nanoplot_r1_median_readlength_clean,nanoplot_r1_stdev_readlength_clean,nanoplot_r1_n50_clean,nanoplot_r1_median_q_cleanandnanoplot_r1_est_coverage_clean.
- New read quality-control outputs have been added:
-
TheiaCoV Flu Track
- All of the flu-specific tasks now live in their own sub-workflow,
flu_track. This has no effect on the end-user. - In TheiaCoV_ONT, flu samples will now have both the HA and NA segment’s assembly mean coverage appear in the assembly_mean_coverage output variable. This reflects the behaviour already present on TheiaCoV_Illumina_PE.
- The all-segments FASTA header lines now include samplename.
- The new output
irma_subtype_notesnow indicates if IRMA was able to determine the flu subtype - All workflows now uses
abricate_flu_subtype(instead ofirma_subtype) for selecting the appropriate nextclade_dataset_tag. - Nextclade outputs columns for flu now explicitly state either HA or NA.
- Padded assemblies, where
-or.present in the final assembly file are either removed or replaced byN(respectively), are now being provided to MAFFT and VADR to prevent task failures.
- All of the flu-specific tasks now live in their own sub-workflow,
-
Terra_2_NCBI
- Skipping BioSample submission via the
skip_biosampleoptional now skips the requirement to have BioSample metadata in your data table.
- Skipping BioSample submission via the
-
Augur_Prep_PHB and Augur_PHB
- RSV-A and RSV-B can now be analyzed with the Augur workflows.
- Metadata no longer required to run Augur. Only a distance tree will be created if metadata is not provided.
-
kSNP3 and other phylogenetic inference workflows
- Outputs from phylogenetic workflows (SNP matrices) and the summarize_data task will now have a properly toggleable Phandango coloring suffix.
- The
phandango_coloringoptional input is now off by default.
Docker container updates:
- IRMA has been updated to version v1.1.5
- AMRFinderPlus has been updated to version v3.12.8-2024-05-02.2
- ts_mlst database has been updated as of 2024-06-01
- Pangolin database has been updated to pdata v1.27
🐛 Bug fixes and small improvements:
- TheiaProk_ONT and TheiaProk_FASTA: Hicap was being run in TheiaProk_ONT but the outputs were never appearing in the data table! This has been fixed.
- All TheiaCoV workflows: Unsupported organisms will no longer cause workflow failures.
- Terra_2_NCBI: Fixed a typo when using the Wastewater Biosample package that was causing an error.
- Freyja_Dashboard: The freyja_dasbhoard output variable now correctly says freyja_dashboard.
- Workflows that accept String inputs that are used to name things: Several input variables such as
cluster_namenow accept Strings with whitespace. - All workflows: Runtime parameters have been adjusted for several tasks.
- TheiaCoV Flu Track: A bug has been fixed for IRMA running out of disk space. Additionally, another bug affecting Flu B samples was fixed related to empty HA segment FASTA files.
What's Changed
- TheiaCoV wf support for RSV - run nextclade by default and small optimizations (kraken_target_organism, genome_length) by @kapsakcj in #436
- [New workflow - internal] Gambitcore for assembly quality assessment with GAMBIT by @cimendes in #466
- [TheiaProk_ONT and TheiaCoV_ONT] Expose additional QC metrics from nanoplot for both raw and clean reads by @cimendes in #452
- Exposing r1 and r2 mean_q_clean and mean_readlength_clean by @jrotieno in #455
- [TheiaProk_ONT] add patch fix to kmc estimated genome size to not go over 10Mbp by @cimendes in #459
- Add abricate as optional module by @jrotieno in #431
- [TheiaProk_ONT] Add Kraken2 as part of read_qc by @cimendes in #438
- [Flu] Assembly mean coverage & read screen clean-up by @sage-wright in #469
- [Freyja_Dashboard] fix typo in freyja_dashboard output File variable name by @AndrewLangvt in #482
- [Terra_2_NCBI] remove metadata requirements with skip_biosample == true by @sage-wright in #475
- Augur Updates for RSV-A and RSV-B by @jrotieno in #478
- [kSNP3] fix behaviour when phandango colouring is set to false by @cimendes in #496
- [Internal] Updating runtime parameters by @sage-wright in #494
- Automatically convert spaces to dashes in workflows that accept strings by @AndrewLangvt in #498
- [TheiaCoV] Enable user to run TheiaCoV with an unsupported organism by @sage-wright in #501
- [AMRFinderPlus] parse BETA-LACTAM genes and subclasses into individual output columns by @sage-wright in #505
- IRMA bug fixes & improvements; theiacov_illumina_pe wf updates for Flu by @kapsakcj in #468
- Augur_PHB: Set sample_metadata_tsvs input to optional by @jrotieno in #503
- [Internal - Gambitcore] Downgrade database to stable 1.3.0 version by @cimendes in #473
- [TheiaCoV_Illumina_PE & _ONT] Create sub-workflow for flu-specific modules by @sage-wright in #502
- [TheiaProk] Add abricate module for vibrio characterization by @cimendes in #429
- [TheiaProk] expose hicap outputs in theiaprok_fasta and theiaprok_ont by @cimendes in #508
- Fix typo in Terra_2_NCBI Wastewater metadata by @michellescribner in #519
- [TheiaProk] Update amrfinderplus to v3.12.8; DB: v2024-05-02.2; reduce compute resources by @kapsakcj in #514
- [TheiaProk] upgrade mlst docker image to 2024-06-01 staphb build; reduced runtime parameters; enable preemptible by @kapsakcj in #516
- update default...