02 Feb 18:46

xonq

4ea22c1

v4.1.0 Latest

Latest

This release expands automated quality control to new workflows and includes influenza segment thresholds; standardizes adapter, primer, and read trimming; and updates software versions. Documentation updates and various bug fixes are also implemented.

See more details regarding these changes here!

🚀 Changes to existing workflows

Changes to genomic characterization workflows

All Consensus Assembly Workflows

percent_mapped_reads is correctly calculated

All Illumina workflows

fastp Docker and JSON reports are now available outputs

All Viral Workflows

Nextclade dataset tags is updated
Pangolin is updated to version 4.3.4-pdata-1.37
IRMA flu aligned reads are now extracted and outputted

All TheiaCoV, all TheiaProk, and TheiaEuk Illumina PE workflows

QC Check is now case-insensitive and accepts FastQC read quality control as input

All TheiaCoV, all TheiaProk, Freyja, tbprofiler-tNGS, and TheiaEuk Illumina workflows

Support for adapter trimming via Trimmomatic is added

All TheiaProk workflows

Vibecheck is updated

All TheiaViral workflows

CheckV and Krakentools soft-fail if de novo assembly quality is low / reads cannot be extracted

All TheiaViral Illumina workflows

TheiaViral allows for finer read quality control inputs and default to fastp for read trimming (breaking change)

Freyja Workflows

Freyja is updated, expands quality control features, and the Freyja Update workflow is removed

TheiaCoV Illumina PE and TheiaCoV ONT

Segment-based QC Check for influenza is added

TheiaViral Illumina PE

Support for primer trimming is added

Changes to phylogenetic workflows

All Phylogenetic workflows

Summarize Data now matches specific names when generating its output table

Snippy_Tree

IQ-TREE bootstraps input variable name is now ultrafast_bootstraps

Changes to data import workflows

BaseSpace_Fetch

More robust project/run ID file matching is implemented

📚 Documentation updates

All workflow input table default values are synchronized, including embedded defaults
New SOPs is added
TheiaViral VSP genomic characterization modules and outputs are given detailed explanations
Freyja Workflow Series diagram is updated

What's Changed

[PHB] Set defaults to automatically propagate to I/O tables by @xonq in #960
[Assembly_Metrics] Correctly calculate percent_mapped_reads by @xonq in #967
[Docs] Expand VSP output descriptions by @xonq in #975
[IQ-TREE] Specificity of Parameters by @awh082834 in #979
[TheiaViral | Read_QC_Trim_PE] Simplify TheiaViral and Theia* Illumina PE trimming by @xonq in #970
[TheiaProk | TheiaEuk | TheiaCoV] Enable Illumina QC Check Table FASTQC read number input by @xonq in #976
[TheiaViral] Enable Krakentools + CheckV soft-fail by @xonq in #964
[Documentation] Update 1 SOP by @nehavm456 in #978
[Documentation] Updating 1 SOP by @nehavm456 in #973
[Summarize_Data] Ensure specific string matching by @xonq in #977
[Freyja] Update to Freyja2 and more additions to freyja by @Michal-Babins in #961
[FastP] Expose fastp docker and json by @xonq in #989
[TheiaViral | Trimmomatic] Support for Adapter and Primer Trimming by @MrTheronJ in #969
[Vibecheck] Fix subsampling arg declaration by @xonq in #982
[Organism Parameters] Update nextclade dataset tags and pangolin docker version by @Michal-Babins in #993
[Documentation] Update SOP entries for TheiaProk Illumina PE v3 and v4 by @brunatodani in #991
[Documentation] Update BaseSpace_Fetch_PHB SOP to v4 by @cimendes in #994
[Basespace_Fetch] Extending grep -E to project ID track by @awh082834 in #981
[Flu Track] Expose and deinterleave IRMA aligned reads by @awh082834 in #990
[TheiaViral_Panel] Turn Nextclade Outputs Generic by @awh082834 in #997
[QC_Check | Flu_Track] Refactor and implement segment QC-check by @xonq in #980
[bbduk] Add pre-alignment primer trimming by @MrTheronJ in #998
[v4.1.0] Release preparation by @xonq in #995
[Assembly Stats] Update samtools by @xonq in #1000

New Contributors

@nehavm456 made their first contribution in #978
@brunatodani made their first contribution in #991

Full Changelog: v4.0.0...v4.1.0

Contributors

cimendes, xonq, and 5 other contributors

Assets 2

01 Dec 20:09

sage-wright

v4.0.0

567a9d6

v4.0.0

Public Health Bioinformatics v4.0.0 Major Release Notes

This release adds three new workflows, reworks the organism-specific characterization logic in TheiaCoV and TheiaViral, and makes significant improvements to many workflows. Documentation updates and various bug fixes have also been implemented.

There are several breaking changes in this release that prevent backwards compatibility. We’ve marked these items in the release notes with "breaking change" in the header. To know about how to migrate to this release, please see our migration guide: Migration to PHB v4.

See more details regarding each of these changes here!

🆕 New Workflows

TheiaViral_Panel_PHB
- TheiaViral_Panel is a workflow that incorporates the assembly approach of TheiaViral_Illumina_PE into a panel-compatible format. By using a set of taxon IDs, reads that are specific to each included taxon are extracted for attempted genome assembly and any applicable viral characterization.
- Import this workflow from Dockstore.
PhyloCompare_PHB
- PhyloCompare generates cophylogeny plots that visualize the differences in two phylogenetic trees’ branching orders and tip arrangements (topology). PhyloCompare includes an additional quantitative validation module, which can validate that two phylogenies have the same topology using distance metrics.
- Import this workflow from Dockstore.
ONT_Barcode_Concatenation_PHB
- ONT read data sometimes requires concatenation by barcode. This workflow enables easy concatenation of your read data and adds it to a new or existing Terra table.
- Import this workflow from Dockstore.

🚀 Changes to existing workflows

All workflows that characterize viral pathogens
- morgana_magic, a new subworkflow, now controls all viral characterization logic (breaking change)
All TheiaCoV Workflows
- Nextclade was updated to version v3.16.0
- Nextclade dataset tags have been updated
- Pangolin was updated to version 4.3.3-pdata-1.36
- VADR was updated to version 1.6.4
- VADR now supports measles, mumps, and rubella
- IRMA and iVar now summarize minor alleles for each influenza segment
- Additional read quality score metrics have been added to the TheiaCoV workflows
TheiaCoV_FASTA
- Influenza characterization is now supported
- The qc_check_phb module was renamed to qc_check_task to match other workflows (breaking change)
All TheiaEuk workflows and Cauris_CladeTyper
- The C. auris CladeTyper tool now includes the Clade VI reference.
- The Clade I reference has been updated to use a complete genome.
All TheiaEuk and TheiaProk workflows
- Read Screen Handles Cryptic Errors Better
All TheiaProk workflows
- AMRFinderPlus gene outputs are now alphabetized
- The database of AMRFinderPlus was updated to version 2025-07-16.1
- The Bakta proteins input parameter was corrected to be a File type variable
- SeqSero2 has been deprecated in favor of SeqSero2S (breaking change)
- ECTyper has been updated to 2.0.0
- ResFinder now has additional and updated outputs
All TheiaProk workflows and Gambit_Query
- The GAMBIT prokaryotic database was updated to version v2.1.0
TheiaProk_Illumina_PE , TheiaProk_ONT, and TBProfiler_tNGS
- TBProfiler VCF output is now appropriately being captured
- TBProfiler database branches can now be specified (breaking change)
- tbp-parser min_depth is now explicitly set to 10
- tbp-parser coverage calculations are now correct when tngs_data is set to true
TBProfiler_tNGS
- The Trimmomatic bases_to_crop default value has been removed (breaking change)
- The Trimmomatic module is now optional
- Clockwork read decontamination is now available as an optional module
- Read statistics are now generated with fastq_scan
All TheiaViral workflows
- TheiaViral now incorporates genomic characterization modules for extended range of pathogens
- Hosting of internally versioned databases and taxonomy
- Several inputs have changed location (breaking change)
All ONT workflows
- The read_qc_trim module was renamed to read_QC_trim to match other workflows (breaking change)
Samples_to_Ref_Tree is now Nextclade_Batch
- Samples_to_Ref_Tree has been renamed to Nextclade_Batch and now has updated error handling (breaking change)
Augur
- Augur and Augur_Prep have been updated to v35.1.0 and revamped to improve performance (breaking change)
Core_Gene_SNP
- snp-sites is now used in core tree generation
Mercury_Prep_N_Batch
- Mercury was updated to version 1.1.3
- BankIt FASTA and metadata files are now workflow outputs
Terra_2_NCBI
- Read files are now renamed to their corresponding library_ID field
BaseSpace_Fetch
- Discrepant separators ("_" vs "-") between sample names and BaseSpace entities are now able to be handled
- Only intended samples are now pulled from BaseSpace

📚 Documentation updates

All workflow input and output tables were synced and are now completely up-to-date
Dockstore links have been added to the “Quick Facts” section for every workflow for easier import
New SOPs were added
The runtime section in our code contribution guide now specifies disk and disks (Thanks, Ash O’Farrel!)
TheiaCoV documentation has been overhauled and reorganized
- The TheiaCoV_Illumina and TheiaCoV_ONT workflow diagrams have been updated
TheiaProk documentation has been overhauled and reorganized
- The TheiaProk_ONT workflow diagram has been updated
TheiaValidate documentation was updated
Typos have been eliminated, and in many places, clarity was finally restored

What's Changed

[PhyloCompare] Create phylogenetic comparison and validation workflow by @xonq in #771
[TheiaProk] Add theiaprok_ont diagram by @cimendes in #881
[TBProfiler] Capture VCF Output by @MrTheronJ in #884
[Documentation - TheiaCoV] update theiacov diagram by @cimendes in #888
[PhyloCompare] Add cophylogeny plot generation module, update docs, and versions by @xonq in #889
[PhyloCompare | TheiaViral] Documentation update by @xonq in #890
[TheiaCov_FASTA] Add support for Influenza by @MrTheronJ in #872
[Documentation] Add Dockstore links to Quick Facts by @sage-wright in #894
[Samples_to_Ref_Tree -> Nextclade_Batch] Identify legacy reference trees by @xonq in #887
[AMRFinderPlus] Alphabetize gene outputs for AMRFinderPlus string outputs by @awh082834 in #897
[Documentation] Improve TheiaCoV Documentation by @sage-wright in #896
[TheiaViral] Implement genomic characterization for other viruses by @xonq in #893
[AMRFinderPlus] Update to database version 2025-07-16.1 by @awh082834 in #902
[Mercury_Prep_N_Batch] Updating Mercury Version; Exposing Bankit files and changing metadata to a TSV by @awh082834 in #904
[TheiaEuk | CladeTyper] Add Clade VI reference and implement CladeTyper thresholding by @xonq in #871
[Nextclade] Update Nextclade to v3.16.0 and update dataset tags by @awh082834 in #907
[VADR] Refactor and Add Support for Additional Viruses by @MrTheronJ in #882
[Documentation] Sync All Inputs and Outputs by @MrTheronJ in #906
[ECTyper] Update to 2.0.0 with added outputs and documentation updates by @awh082834 in #847
[Documentation] Fix formatting in TheiaValidate and link typo in digger_denovo by @theiadeb in #912
[Documentation] TheiaProk task overhaul by @sage-wright in #913
[ONT_Barcode_Concatenation] New workflow to concatenate ONT barcodes by @sage-wright in #900
[Documentation] TheiaCoV assembly descriptions and other fixes by @sage-wright in #916
[Bakta] Change Boolean Input to Correct File Designation by @awh082834 in https://github.com/theiage...

Contributors

cimendes, aofarrel, and 8 other contributors

Assets 2

17 Jul 19:46

xonq

v3.1.1

0d3ce7a

v3.1.1

Public Health Bioinformatics v3.1.1 Patch Release Notes

🩹 This patch release reverts `ts_mlst` default behavior to pre-v3.1.0 status, enables specific use-case `ts_mlst` input options, alongside other minor changes

Find our full release notes here!
Find our documentation here!

Bug Fixes

This patch release resolves unintended typing of E. coli as Aeromonas and other organisms with similar alleles. When an E. coli sample was passed, alleles of Aeromonas could be called before identifying all of the alleles associated with E. coli, which can cause a mischaracterization as Aeromonas. Users can now usemlst_scheme_override to toggle on/off the exclusion of problematic allele sets with E. coli samples. By default, this behavior is turned off (”false”) because it is targeted for a specific use-case. We generally recommend users do not enable this option because it will overwrite the default sets excluded by ts_mlst. Please reach out if you would like to discuss this option.
Reverted ts_mlst to not run both schemes for 3.1.0 in E. coli and A. baumannii by default*.* Previous release (v3.1.0) implemented a change to ts_mlst that by default enabled both schemes associated with E. coli and A. baumannii. While this behavior may still be desirable, we inadvertently introduced it in a way that users could not "opt in" to this functionality. Users can now decide if they wish to see single or double MLST scheme outputs in a single output column, using themlst_run_secondary_scheme boolean input. By default, this behavior is turned off (”false”).
Fixes Mercury error where skip_ncbi was true for Mpox and GISAID metadata would not populate.

Other Updates

The TheiaProk_Illumina Diagram is updated to include digger_denovo submodules.
Samples_to_Ref_Tree is updated to include multiple genomes in a Nextclade phylogeny by accepting multiple samples as an input array.

What's Changed

[Documentation] update theiaprok_ilmn diagram by @cimendes in #875
[TS_MLST] Refactor of E coli, A baumannii secondary scheme selection and scheme override by @awh082834 in #873
[Samples_to_Ref_tree] Multi-sample genome input by @xonq in #868
[Mercury] update mercury version by @xonq in #879
[TS_MLST] Update docs to include more info regarding override usage by @awh082834 in #880

Full Changelog: v3.1.0...v3.1.1

Contributors

cimendes, xonq, and awh082834

Assets 2

03 Jul 21:06

MrTheronJ

v3.1.0

577835e

v3.1.0

Public Health Bioinformatics v3.1.0 Minor Release Notes

This minor release adds four new workflows - TheiaViral_Illumina_PE, TheiaViral_ONT, TheiaEuk_ONT, and Terra_2_ENA. Documentation updates and various bug fixes have also been implemented.

Full release notes can be found here!

Find our documentation here!

🆕 New workflows

TheiaViral_ONT & TheiaViral_Illumina_PE
- These workflows generate de novo and consensus viral genome assemblies for either Oxford Nanopore (ONT) or Illumina paired-end (PE) sequencing data. TheiaViral is generalized to accommodate diverse and segmented viral lineages, including: hantavirus, norovirus, rabies, influenza, HIV, herpes simplex, ebola, hepatitis C, etc.
- TheiaViral also enables rabies genotyping by introducing a new Lyssavirus rabies Nextclade dataset.
- TheiaViral implements de novo assembly and dynamic reference genome selection to enable consensus genome assembly. The preexisting TheiaCoV workflow is a consensus assembly and characterization pipeline specialized for a subset of viral lineages with static reference genomes.
- Import TheiaViral_Illumina_PE from Dockstore
- Import TheiaViral_ONT from Dockstore
TheiaEuk_ONT
- TheiaEuk_ONT is an Oxford Nanopore Technologies (ONT) genome assembly and fungal genome characterization pipeline. TheiaEuk_ONT is currently intended for haploid fungal genomes. The TheiaEuk workflows support the de novo assembly, quality assessment, and characterization of fungal genomes. This version has been updated to accept basecalled Oxford Nanopore Technologies (ONT) reads as the primary input.
- Import TheiaEuk_ONT from Dockstore
Terra_2_ENA

💡 We are confident in the functionality of this workflow, but were not able to source any partners for final UAT, as many sites do not actively submit to ENA. As such, if you are interested in using this workflow, we'd greatly appreciate working with you for feedback.
- Introduces a standalone workflow for submitting data to the European Nucleotide Archive. Currently supports prokaryotic and viral sample types.
- The workflow is structured in three phases: preparation, registration, and submission.
- The workflow begins by downloading the user's Terra data table and validates the contents against ENA requirements. Every submission to ENA requires mandatory fields for both sample metadata and raw read data.
- Before using the workflow users must register a study with ENA to obtain a study accession number.
- Users should also review the documentation to determine which fields are required for their specific sample_type (currently supporting prokaryotic and viral samples) and add those fields to their Terra data table or input TSV file.
- Import Terra_2_ENA from Dockstore

🚀 Changes to existing workflows

All TheiaCoV Workflows
- Added support for Measles virus analysis using Nextclade
- IRMA now handles cases when an assembly cannot be created
- Enable Nextclade for H5N1 samples with GenoFLU genotype D1.1
- Updated Pangolin docker image
All TheiaProk Workflows
- Addition of task_arln_stats.wdl to include missing ARLN stat checks
- Allow for task_mlst to handle mischaracterization for certain E. coli schemes
- Abort ANI calculations if a table is only populated with 0 values
- Updated stxtyper docker image
- Updated task_kmerfinder documentation
- Updated AMRFinder docker image
- Updated GAMBIT Prokaryotic DB
- Implemented updates and changes to correspond to Phoenix / ARLN statistics criteria
TheiaEuk_Illumina_PE
- Enabled kraken2 read classification
TheiaEuk_Illumina_PE, TheiaProk_Illumina_PE, TheiaProk_Illumina_SE
- Deprecated shovill and replaced with digger_denovo subworkflow
TheiaProk_Illumina_PE, TheiaProk_Illumina_SE, TheiaProk_ONT
- Handled edge case where no reads map in task_shigatyper
TheiaProk_Illumina_PE
- Restricted Vibecheck to run only if sample is classified as O1 Vibrio.
All ONT workflows that run Kraken2
- Implemented memory efficient parsing and exposed extra memory parameters
Samples_to_Ref_Tree & All TheiaCov workflows
- Created functionality to accept custom user-provided Nextclade datasets
- Updated Nextclade dataset tags and docker image
BaseSpace_Fetch
- Set api_server to default address
TBProfiler_tNGS_PHB
- Fixed percent coverage and average depth calculations for tNGS data
- Updated tbp_parser docker image
NCBI-AMRFinderPlus_PHB
- Updated AMRFinder docker image
Mercury_Prep_N_Batch_PHB
- Made organism case-insensitive, decoupled organism input from metadata population, and created metadata_organism as argument for populating this field
GAMBIT_Query_PHB
- Update GAMBIT Prokaryotic DB

📚 Documentation Updates

The Freyja page was improved
The GAMBIT_Query_PHB page was updated and improved
The TheiaProk Workflow Series page was improved
The TheiaEuk Workflow Series page was improved
Added SOPs
Update workflow relationships diagram

What's Changed

[Documentation] Update Freyja documentation page by @ss43 in #816
[Documentation] create modular documentation functionality and consolidate input/output tables by @sage-wright in #824
[ARLN] Addition of task_arln_stats.wdl to include missing ARLN stat checks by @awh082834 in #821
[MLST] Allow for task_mlst to handle mischaracterization for certain ecoli schemes by @awh082834 in #819
[kraken2_parse_classified] Implement memory efficient parsing and expose parameters by @MrTheronJ in #820
[Shovill] deprecate shovill and introduce digger_denovo subworkflow by @Michal-Babins in #823
[MUMmer ANI] Abort ANI extraction if a table is only populated with 0 values by @xonq in #830
[Basespace_Fetch] Set api_server to default address by @awh082834 in #831
[Documentation] Widespread macro implementation for the Quick Facts section by @sage-wright in #832
[KmerFinder, TheiaEuk] Docs update to include links and more in depth description by @awh082834 in #841
[STXtyper] Update to container version 1.0.42 by @awh082834 in #840
[Nextclade] Enable custom datasets by @xonq in #833
[Documentation] Add SOPs and update the render_tsv_table macro implementation by @sage-wright in #838
[Documentation] install pandas so that documentation can build by @sage-wright in #843
[Requester Pays] transition all resources to the theiagen-public-resources-rp bucket by @sage-wright in #839
[TheiaProk] Restrict Vibecheck to run on O1 V. cholerae only by @sage-wright in #842
[tbp_parser] Fixing percent coverage calculation for tNGS analysis by @MrTheronJ in #844
[Documentation] Update workflow relationships diagram, fix a few broken links, various tidying by @sage-wright in #849
[Nextclade] Version and DS tag update by @awh082834 in #850
[TheiaEuk] TheiaEuk ONT Workflow by @Michal-Babins in #644
[AMRFinder] AMRFinder update to v4.0.23 and database v2025-06-03.1 by @awh082834 in #854
[Terra_2_ENA] Workflow for submitting data to the European Nucleotide Archive by @MrTheronJ in #826
[TheiaProk] Handle edge case where no reads map in task_shigatyper by @MrTheronJ in #851
[Nextclade & Measles] Adding Measles functionality to TheiaCov by @awh082834 in h...

Contributors

ss43, sage-wright, and 5 other contributors

Assets 2

21 Apr 20:22

sage-wright

v3.0.1

eabf8cf

v3.0.1

Public Health Bioinformatics v3.0.1 Patch Release Notes

🩹 This patch release fixes several bugs that were identified

Find our full release notes here!
Find our documentation here!

Bug Fixes

A bug in TheiaProk_ONT assembly polishing with short reads has been fixed.
A bug in Dorado_Basecalling (when the number of chunks equaled the number of POD5 files) has been fixed.
Ash O’Farrell fixed a bug in the the tbp-parser task that prevents division by zero when a sample has extremely low coverage. Thanks, Ash!
The tbp_parser_min_frequency and tbp_parser_min_percent_coverage variables are appropriately Floats and not Integers. You can provide as many decimals as you want now (but only one per value, in accordance with the rules of math).

Future Proofing

We’ve made a few changes to our workflows to permit compatibility with Terra changing to use the Google Batch API (see the details regarding these changes from Terra here).
We have ended developmental support for the Nullarbor_PHB workflow. This workflow will be available for PHB v3.0.0 and below in perpetuity but will no longer be updated.

Other Updates

The AMR_Search task has been added to the TheiaProk and TheiaEuk workflows! To run AMR_Search, set the new input parameter run_amr_search to True.
The following Docker images have been updated:
- AMRFinderplus → us-docker.pkg.dev/general-theiagen/staphb/ncbi-amrfinderplus:4.0.19-2024-12-18.1
- MLST → us-docker.pkg.dev/general-theiagen/staphb/mlst:2.23.0-2024-12-31
- tbp-parser → us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:2.4.5
- Pangolin → us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.33
Various documentation updates have been made and broken links fixed.

What's Changed

[Merlin_Magic] Integrate AMR_Search_PHB into Merlin_Magic by @awh082834 in #797
Handle edge case of extremely low coverage by @aofarrel in #798
[AMRFinderPlus] Docker update to ncbi-amrfinderplus:4.0.19-2024-12-18.1 by @awh082834 in #799
[MLST] update docker container by @xonq in #800
[Documentation] Various updates by @sage-wright in #807
[tbp-parser] update version by @MrTheronJ in #803
Path Correction for Google Batch API Changes by @sage-wright in #806
[VADR] Update mem for gcp batch by @Michal-Babins in #808
[Pangolin] Update to 4.31-pdata-1.33 by @awh082834 in #811
[AMR_Search & Merlin_Magic] Address style guide issues and formatting by @awh082834 in #813
[v3.0.1] Update versioning by @sage-wright in #810
[TheiaProk] update export_taxon_table and documentation by @sage-wright in #814

Full Changelog: v3.0.0...v3.0.1

Contributors

aofarrel, sage-wright, and 4 other contributors

Assets 2

03 Apr 20:19

sage-wright

v3.0.0

14b5a9d

v3.0.0

Public Health Bioinformatics v3.0.0 Major Release

This major release adds four new workflows, updates the assembly algorithm for TheiaProk_ONT, and makes significant improvements to many workflows. Documentation updates and various bug fixes have also been implemented.

Full release notes can be found here!

Find our documentation here!

🆕 New workflows:

Clair3_Variants_ONT_PHB
- This workflow processes ONT sequencing data to identify genetic variations compared to a reference genome. Clair3 is a small variant caller for long-reads and is now the preferred option in the latest version of the Artic pipeline. This update allows for compatibility with newer base calling models and changes the default model to r1041_e82_400bps_hac_v420. Other supported Clair3 models can be found here
- Import this workflow from Dockstore
kSNP4_PHB
- This workflow mimics kSNP3 but uses the latest version — called kSNP4. Algorithmically, there are no significant changes between kSNP3 and kSNP4. The main benefit of upgrading to kSNP4 is that it is more optimized and reduces overall runtime. Since the software changed names, we made a new workflow so the names could match.
- Import this workflow from Dockstore
AMR_Search_PHB
- This is a standalone workflow for PathogenWatch AMR-search; future plans include adding this workflow to TheiaProk. This encapsulates the functionality of the AMR portion of PathogenWatch allowing users to retain the functionality without housing their data with an external source. It utilizes the PAARSNP to infer resistance phenotypes. Currently functionality is limited to a small number of species.
- Import this workflow from Dockstore
Dorado_Basecalling_PHB
- This workflow performs GPU-accelerated basecalling on Oxford Nanopore POD5 files. The user uploads all of the raw POD5 files to their workspace (or alternative Google Cloud Storage bucket location) where the workflow will identify, basecall, demultiplex,, and optionally trim the POD5 files into FASTQs. The resulting FASTQ files will be added to an existing or new Terra table with one row for each barcode.
- Import this workflow from Dockstore

🚀 Changes to existing workflows:

All Genomic Characterization Workflows
- Language has been standardized
- skip_screen actually skips the read screen task
- All read_screen checks are run and the results are output to a file
All ONT Workflows
- Additional inputs for read_qc_trim are now available for modification
All TheiaCoV Workflows
- Nextclade has been updated to v3.10.2
- Default Nextclade dataset tags have been updated
- Pangolin has been updated to 4.3.1-pdata-1.32
TheiaCoV_Illumina_PE and TheiaCoV_ONT: Influenza Characterization
- IRMA has been updated to v1.2.0 and new inputs and outputs are now available
- MIRA standards are now used for default values in IRMA
- GenoFLU has been updated to v1.06, a new input is available, and only runs when the clade is 2.3.4.4b.
- Custom databases can now be provided to Nextclade for B3.13 H5N1 samples
- Nextclade now runs on H5N1 samples if a custom dataset file is provided
TheiaCoV_Illumina_PE and TheiaCoV_Illumina_SE
- iVar variants no longer fails due to lack of variant identification
TheiaEuk_Illumina_PE
- The GAMBIT fungal database has been updated to v1.0.0
- RASUSA has been updated to v2.1.0
- A few C. auris cladetyping outputs were renamed
All TheiaProk Workflows
- export_taxon_table inputs removed from Terra
- Bakta database customization is now available
- hicap optional parameters are now available for modification (for Haemophilus influenzae)
- sonneityping no longer fails due to sonnei typing disagreement (for Shigella sonnei)
- StxTyper has been updated to v1.0.40 and new outputs are available (for Shigella spp., but can be run on any sample)
- SeqSero2 has been updated to v2.1.3.1 and new outputs are available (for Salmonella spp.)
- SISTR has been updated to v1.1.3 and new outputs are available (for Salmonella spp.)
- TBProfiler has been updated to v6.6.3 (for Mycobacterium tuberculosis)
- tbp-parser has been updated to v2.4.4 and a new input is available (for \Mycobacterium tuberculosis)
TheiaProk_Illumina_PE
- vibecheck added for Vibrio cholerae characterization
TheiaProk_ONT
- The assembly process was transitioned away from dragonflye in favor of a non-wrapped individual tool implementation
- The plasmid detection tool, Tiptoft, was deprecated
- RASUSA has been updated to v2.1.0
All Freyja Workflows
- Freyja has been updated to v1.5.3
- A reference GFF can now be provided to Freyja_FASTQ and the primer_bed input is now optional
- Additional output columns have been created for Freyja_FASTQ from the results file
- The pathogen flag has been added to all Freyja commands in Freyja_FASTQ
All Phylogenetic Workflows
- The reorder_matrix task output file suffix was changed to be more general
Augur_PHB
- The augur_clades task no longer runs if an augur_clades_tsv is provided
Snippy_Streamline and Snippy_Streamline_FASTA
- When include_gbff is true, the GBFF file is now used as the reference
- Viral genomes can be retrieved (for a reference) by setting use_ncbi_virus to true
- The centroid genome can be used as the reference by setting use_centroid_as_reference to true
Kraken_PE, Kraken_SE, and TheiaMeta_Illumina_PE
- krona was updated to be compatible with viral data
Mercury_Prep_N_Batch
- Some metadata columns can now be populated from workflow inputs
Terra_2_NCBI
- A custom mapping file for tables with different column headers is available
Assembly_Fetch_PHB
- Viral genomes can be retrieved by setting use_ncbi_virus to true.
SRA_Fetch
- Workflow versioning information is now output
Cauris_Cladetyper
- Outputs were renamed
RASUSA_PHB
- RASUSA has been updated to v2.1.0
TBProfiler_tNGS_PHB
- The bases_to_crop variable default has been set to 0.
- TBProfiler has been updated to v6.6.3
- tbp-parser has been updated to v2.4.4 and a new input is available
TheiaValidate_PHB
- TheiaValidate has been updated to v1.1.2

📖 Documentation updates

Formalized the philosophy of workflow failures - see “Getting Started”
Created guides for GAMBIT, GAMBIT database creation, phylogenetics, and custom organisms through TheiaCov
A SARS-CoV-2 wastewater metadata formatter was added to Terra_2_NCBI
Fixed input table in the Augur_PHB documentation
Added documentation for the qc_check task in TheiaCoV
Many dead links have been fixed
The TheiaMeta documentation page was improved
The Freyja documentation page was improved
The Snippy_Streamline and Snippy_Streamline_FASTA pages were improved
The data export workflows were improved and standardized — these include Concatenate_Column_Content, Transfer_Column_Content, and Zip_Column_Content
The data import workflows were improved and standardized - these include Assembly_Fetch, BaseSpace_Fetch, Create_Terra_Table, and SRA_Fetch.
Sources added to the influenza antiviral resistance detection module
The new workflow documentation template was improved
We’ve fixed our formatting in multiple places and standardized our language in others.

What's Changed

[Documentation] Fix overflow in Augur documentation by @sage-wright in #706
[GenoFLU] update docker to v1.05 by @sage-wright in #704
[Documentation] add qc check documentation to TheiaCoV by @sage-wright in #701
[Cauris_Cladetyper] Various improvements and removal of old TheiaCauris references by @sage-wright in #700
[read_QC_trim_ONT] add additional inputs for user modification by @sage-wright in #702
[Read_Screen] skip_screen actually skips the screen by @sage-wright in #699
[SRR_Fetch] update doc links by @fraser-combe in #719
[Documentation] SC2 Wastewater metadata formatter by @frankambrosio3 in #721
[Documentation] Adding philosophy of workflow failures page by @sage-wright in https://github.com/theiagen/p...

Contributors

watronfire, kapsakcj, and 8 other contributors

Assets 2

19 Dec 20:43

sage-wright

v2.3.0

f81fdb1

v2.3.0

Public Health Bioinformatics v2.3.0 Minor Release

This minor release adds two new workflows, Fetch_SRR_Accession_PHB and Concatenate_Illumina_Lanes_PHB, and makes significant improvements to the TheiaCoV, TheiaEuk, TheiaProk, and TheiaMeta workflow series. Documentation updates and various bug fixes have also been implemented.

Full release notes can be found here!

Find our documentation here!

🆕 New workflows

Concatenate_Illumina_Lanes_PHB
- Some Illumina sequencing platforms produce FASTQ files split across multiple lanes for a single sample. This workflow combines multi-lane FASTQ files from Illumina sequencing runs into a single read1 and read2 file per sample. This workflow is ideal for Illumina sequencing outputs where data from multiple lanes must be combined to proceed with analysis workflows such as assembly or variant calling as it ensures that downstream workflows receive consolidated FASTQ files
- This workflow is designed to run automatically at the start of the TheiaProk workflow if multi-lane FASTQ files are provided (e.g., read1_lane2.fastq.gz and read2_lane2.fastq.gz)
- Import this workflow from Dockstore
Fetch_SRR_Accession_PHB
- This workflow will retrieve any Sequence Read Archive (SRA) accessions (SRR) associated with a given sample accession, such as a BioSample ID (e.g., "SAMN00000000") or SRA Experiment ID (e.g., "SRX000000").
  - This process utilizes the fastq-dl tool to fetch metadata from SRA and outputs the corresponding SRR accession(s).
  - If multiple SRR accessions are linked to a single sample, the workflow will output them as a comma-separated list.
- This workflow is particularly useful for retrieving SRR accessions a few days after running Terra_2_NCBI workflows.
- Import this workflow from Dockstore

🚀 Changes to existing workflows

All Genomic Characterization Workflows
- The read screen is now compatible with Dorado-produced FASTQ files
All Illumina Workflows
- fastq_scan has been updated to the latest version
All TheiaCoV Workflows
- The percentage of mapped reads is now output in all TheiaCoV workflows (except TheiaCoV_FASTA)
- The default Nextclade dataset tags have been updated for SC2, mpox, flu, RSV-A, and RSV-B
- The default Pangolin docker is now us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.31
- Kraken2 standalone is now used and databases must be provided.
TheiaCoV_Illumina_PE and TheiaCoV_ONT
- Default parameters have been set for H5N1 flu
- IRMA assembled flu segments now in sorted order
All TheiaEuk Workflows
- Additional genes for Candida auris are now examined by default in the Snippy_Gene_Query task
- Bug fix to the snippy_variants_num_variants output column for Cryptococcus neoformans
TheiaMeta_Illumina_PE
- MIDAS is now an optional task in TheiaMeta.
All TheiaProk Workflows
- stxtyper was added to all TheiaProk workflows
TheiaProk_Illumina_PE and TheiaProk_Illumina_SE
- Multi-lane Illumina data can now be used as input natively.
TheiaProk_Illumina_PE and TheiaProk_ONT
- TBProfiler has been updated to v6.4.1
- tbp-parser has been updated to v2.2.2
Augur_PHB
- Versioning information for the tree-building tools is now available
All Freyja Workflows
- Freyja now supports non-SARS-CoV-2 organisms natively.
Mercury_Prep_N_Batch
- Errors no longer occur when data has been previously transferred
- The correct information is now being provided for GISAID’s covv_coverage column for ClearLabs data
- Failures now fail the task
Snippy Workflows
- A new file with QC metrics has been created
- Additional QC metrics are now output
Terra_2_NCBI_PHB
- Collection dates will no longer have decimals

📚 Documentation Updates

Search tables better with table-specific search bars
Dead links removed
Generally improved documentation

What's Changed

[Documentation] Updated Snippy variants output documentation by @fraser-combe in #623
[TheiaCoV] iVar Consensus Pipefail fix by @Michal-Babins in #629
[TheiaProk] expose sistr optional param inputs to theiaProk wfs by @fraser-combe in #603
[Documentation] fix broken links by @sage-wright in #627
Snippy_Variants: Calculate % reads aligned by @fraser-combe in #616
[Augur +TheiaCoV] Enable H5N1 flu subtype augur & nextclade by @Michal-Babins in #640
[TheiaMeta] Midas call in read_QC_trim_pe.wdl workflow and outputs by @fraser-combe in #619
[TheiaCoV] Reorder flu segments from largest to smallest in irma task by @Michal-Babins in #635
[Mercury] prevent silent failures by @sage-wright in #648
Fixed theiacov documentation to specify assembly order by @Michal-Babins in #652
[TheiaCov & TheiaProk & TheiaEuk] read screen ONT bugfix and improvements by @kapsakcj in #650
[TheiaCoV ONT and Clearlabs] Update consensus task container to artic:1.2.4-1.12.0 by @cimendes in #636
[Documentation] Search bar for tables within docs by @fraser-combe in #646
[TheiaEuk] Additional genes for Snippy_Gene_Query by @sage-wright in #647
[MerlinMagic] Fixed output for crypto snippy_variants_num_variants by @Michal-Babins in #654
[Documentation] type error correction theiacov wf by @fraser-combe in #660
[TheiaProk] Adds stxtyper to merlin_magic and TheiaProk wfs by @kapsakcj in #525
[Mercury] bump mercury docker to 1.0.9: bugfix for GISAID metadata covv_coverage column by @kapsakcj in #661
[TheiaCov] wfs add percentage_mapped_reads by @fraser-combe in #641
[Documentation] Update MIDAS database documentation in TheiaProk by @fraser-combe in #667
Add Snippy_Variants QC outputs to Snippy_Tree and Snippy_Sreamline workflow outputs by @jrotieno in #592
[TheiaCoV/TheiaProk/TheiaMeta/TheiaEuk/Freyja_FASTQ] fastq-scan updates & improvements. Adding JSON as wf output file by @kapsakcj in #662
Prevent Silent Errors by @sage-wright in #666
[Augur] Add augur tree iqtree model type to output by @Michal-Babins in #674
[Terra2NCBI] Force collection_date to be a string by @cimendes in #658
[Documentation] Update code contribution guidelines by @fraser-combe in #675
[Retrieve_SRR_Metadata] New wf to retrieve SRR after Terra2NCBI wf by @fraser-combe in #668
Documentation Update by @frankambrosio3 in #678
[Documentation] Various updates by @sage-wright in #680
[TheiaCoV] Update nextclade dataset tags and pangolin docker version by @Michal-Babins in #679
[Documentation] update dataset tags by @Michal-Babins in #681
[TheiaCoV] Split database from Kraken2_TheiaCoV task by @cimendes in #670
[TheiaCoV] Update nextclade dataset tag for H5N1 to the latest version by @Michal-Babins in #683
[Freyja] Update freyja to version 1.5.2, expose pathogen flag and minor update to docs by @cimendes in #684
[Augur] Expose Augur versions by @Michal-Babins in #686
[TheiaProk] Update default versions for TB-Profiler and tbp-parser by @sage-wright in #673
v2.3.0 final changes by @sage-wright in #693
[Concatenate_Illumina_Lanes] Fix bug when single-end only by @sage-wright in https://gith...

Contributors

kapsakcj, cimendes, and 5 other contributors

Assets 2

17 Sep 15:12

sage-wright

v2.2.1

9a10de7

v2.2.1

Public Health Bioinformatics v2.2.1 Patch Release Notes

🩹 This patch release fixes the output names for the NCBI-Scrub standalone workflows.

Our documentation has also been migrated to GitHub for easier maintenance.

Full release notes can be found here!
Find our documentation here!

What's Changed

[Documentation] Transfer all PHB documentation to GitHub by @sage-wright in #605
[NCBI Scrub Standalone Workflows] Correct output declarations for the number of spots removed by @cimendes in #610
[v2.2.1] update version tag by @sage-wright in #622

Full Changelog: v2.2.0...v2.2.1

Contributors

cimendes and sage-wright

Assets 2

03 Sep 13:22

sage-wright

v2.2.0

5be3433

v2.2.0

Public Health Bioinformatics v2.2.0 Minor Release Notes

This minor release adds two new workflows, Create_Terra_Table_PHB and Snippy_Streamline_FASTA_PHB, and makes significant improvements to the TheiaProk, TheiaCoV, TheiaMeta, and Freyja workflow series. Additionally, several bug fixes have been made.

Full release notes can be found here!

Find our documentation here!

🆕 New workflows:

Create_Terra_Table_PHB
- The manual creation of Terra tables can be tedious and error-prone. This workflow will automatically create your Terra data table when provided with the location of the files. It can import assembly, paired-end (Illumina) and single-end (Illumina and Oxford Nanopore) data.
- Import the workflow from Dockstore.
Snippy_Streamline_FASTA_PHB
- Since Snippy_Variants_PHB is now compatible with assembled sequences as input in FASTA format, we have developed Snippy_Streamline_FASTA, an all-in-one approach to generating a reference-based phylogeny using the Snippy tools, mirroring the Snippy_Streamline_PHB workflow. By default, it runs Snippy_Variants and Snippy_Tree, but will optionally run Assembly_Fetch if a reference genome is not provided.
- Import the workflow from Dockstore.

🚀 Changes to existing workflows:

All TheiaProk Workflows
- Genomic characterization with emmtyper is now enabled for Streptococcus pyogenes. (Thanks, @sam-baird!)
- When call_ani is true, failures will no longer occur if multiple hits have the same score.
- Support for Vibrio parahaemolyticus, Vibrio vulnificus and Enterobacter asburiae was added to the AMRFinderPlus task
- VirulenceFinder now runs on Shigella sonnei samples.
- The Docker containers for AMRFinderPlus, tbp-parser and mlst have been updated:
  - AMRFinderPlus: 3.12.8-2024-07-22.1
  - tbp-parser: tbp-parser:1.6.0
  - mlst: 2.23.0-2024-08-01
- Genomic characterization can now be skipped by setting the new optional input perform_characterization to false.
- The GAMBIT prokaryotic database has been updated to v2.0.0-20240628.
- Optional inputs are now available for all tasks within the merlin_magic subworkflow.
All TheiaCoV Workflows
- GenoFLU has been added for H5N1 influenza typing.
- Additional VADR output files have been exposed:
  - File? vadr_feature_tbl_pass
  - File? vadr_feature_tbl_fail
  - File? vadr_classification_summary_file
  - File? vadr_all_outputs_tar_gz
- Aligned FASTQs no longer contain supplemental/secondary alignments.
TheiaCoV_Illumina_PE_PHB and TheiaCoV_ONT_PHB
- Workflow will no longer fail if an assembly cannot be produced. The assembly_fasta column will say "Assembly could not be generated".
TheiaEuk_Illumina_PE_PHB
- TheiaEuk no longer abruptly fails if an organism outside of the expected list of taxa is detected by GAMBIT.
- All optional inputs and docker containers for taxa-specific sub-modules have been exposed.
All ONT workflows (TheiaProk and TheiaCoV)
- KMC is no longer used for genome-size prediction. Instead, for TheiaProk, the expected genome length is now set to 5 Mb, which is around 0.7 Mb larger than the average bacterial genome length. For TheiaCoV, species have default genome lengths associated with their organism tag.
TheiaCoV and TheiaMeta workflows
- The human read removal tool (HRRT) has been updated to v2.2.1. For paired-end data, reads are first interleaved to guarantee that no mates are orphaned by this tool.
All Freyja Workflows
- Freyja has been updated for all workflows to version 1.5.1.
- SARS-CoV-2 UShER barcodes file is now a .feather file.
- Freyja_FASTQ_PHB is now compatible with Illumina paired-end, Illumina single-end and Oxford Nanopore data. A new input ont has been added to control workflow behavior.
- The UShER barcodes and lineage files used are now exposed as outputs in Freyja_FASTQ_PHB
Snippy_Variants_PHB
- In addition to reads, paired-end, and single-end, assemblies are now accepted as input. If Illumina sequencing data is to be used, use the read1 and optionally, the read2, optional inputs to pass the forward and reverse-facing reads respectively, If assembled genomes are to be used, use the assembly_fasta input and omit read1 and read2.
SRA_Fetch_PHB
- SRA-Lite files are now detected when it's a low-quality file.
Augur_PHB
- mpox mutation context has been added to the auspice_input_json output which displays the fraction of G->A or C->T.
GAMBIT_Query_PHB
- The GAMBIT prokaryotic database has been updated to v2.0.0-20240628.
Mercury_Prep_N_Batch_PHB
- Mercury has been moved to its own repository at https://github.com/theiagen/mercury.
- Mercury now processes BioSample & SRA metadata for flu

What's Changed

[TheiaProk] Add emmtyper task for Streptococcus pyogenes by @sam-baird in #524
[SRA-Fetch] Detect SRA-Lite when it's low quality file by @cimendes in #512
Adding the Create_Terra_Table_PHB workflow by @sage-wright in #533
[Create_Terra_Table] recognize fastq files that end in .fq by @sage-wright in #535
[TheiaProk - ANI] prevent failures when multiple top hits have the same score by @sage-wright in #532
[TheiaCoV] Flu: Prevent workflow failures when assembly cannot be produced; generate NanoPlot outputs regardless of assembly success by @sage-wright in #530
[theiaprok] amrfinderplus: add support for Vibrio parahaemolyticus, Vibrio vulnificus, Enterobacter asburiae. Fix C diff bug by @kapsakcj in #542
[TheiaCoV] Add GenoFLU for flu whole-genome genotyping by @sage-wright in #540
[TheiaProk] Merlin_magic subwf bugfix: run virulencefinder on Shigella sonnei by @kapsakcj in #543
[TheiaCoV and TheiaMeta] Update hrrt (ncbi-scrub) to version 2.2.1 and optimise task by @cimendes in #527
[TheiaCoV and TheiaMeta - HRRT] Patch bug by removing unneeded awk verification by @cimendes in #550
Create CODEOWNERS by @AndrewLangvt in #554
[TheiaProk] Add additional input enabling characterization by @sage-wright in #547
Updating templates & broken links in the readme by @sage-wright in #555
[TheiaEuk] Fix bug where String outputs were being passed as File for Snippy_variants by @cimendes in #574
[TheiaProk] update tbp-parser to latest version by @sage-wright in #576
[Create_Terra_Table] fix bug, and enable ability for users to provide their own file ending suffixes by @sage-wright in #575
[theiacov] Add additional vadr output files & tarball; upgrade VADR docker by @kapsakcj in #556
[ONT] Remove KMC by @sage-wright in #578
[Create_Terra_Table] fix sample name i...

Contributors

kapsakcj, AndrewLangvt, and 4 other contributors

Assets 2

26 Jun 14:14

cimendes

v2.1.0

d0377e1

v2.1.0

Public Health Bioinformatics v2.1.0 Minor Release Notes

This minor release improves the utility and usability of several Oxford Nanopore Technologies’ dedicated workflows for viral and bacterial genomic characterization (TheiaCoV and TheiaProk). Additionally, support for new organisms has been added to several workflows.

Full release notes can be found here!

Find our documentation here!

🚀 Changes to existing workflows:

All TheiaProk Workflows
- General Abricate is now available though the call_abricate and abricate_db optional inputs.
- Abricate specifically for Vibrio cholerae is now available. It launches automatically if the gambit_predicted_taxon or expected_taxon is Vibrio cholerae.
- A new optional parameter separate_betalactam_genes is now available that splits AMRFinderPlus beta-lactam hits into new columns.
- The call_midas optional input is now set to false by default.
TheiaProk_Illumina_PE
- New read quality-control outputs have been added: r1_mean_q_clean, r2_mean_q_clean, r1_mean_readlength_clean and r2_mean_readlength_clean.
TheiaProk_ONT
- New read quality-control outputs have been added: nanoplot_r1_median_readlength_raw, nanoplot_r1_stdev_readlength_raw, nanoplot_r1_n50_raw, nanoplot_r1_median_q_raw, nanoplot_r1_est_coverage_raw, nanoplot_r1_median_readlength_clean, nanoplot_r1_stdev_readlength_clean, nanoplot_r1_n50_clean, nanoplot_r1_median_q_clean and nanoplot_r1_est_coverage_clean.
- Kraken2 is now available through the call_kraken and kraken_db optional inputs.
- A maximum genome size of 10Mbp is set to prevent excessive runtimes.

All TheiaCoV Workflows

RSV-A and RSV-B are now able to be analyzed with the TheiaCoV workflows. Nextclade characterization and Kraken taxonomic analysis will now be run on RSV samples.

The following default organisms now have the following Nextclade dataset tags:

Organism	New default Nextclade dataset tag
SARS-CoV-2	"2024-06-13--23-42-47Z"
mpox	"2024-04-19--07-50-39Z"
Flu H1N1 HA	"2024-04-19--07-50-39Z"
Flu H1N1 NA	"2024-04-19--07-50-39Z"
Flu H3N2 HA	"2024-04-19--07-50-39Z"
Flu H3N2 NA	"2024-04-19--07-50-39Z"
Flu Victoria HA	"2024-04-19--07-50-39Z"
Flu Victoria NA	"2024-04-19--07-50-39Z"

TheiaProk_ONT
- New read quality-control outputs have been added: nanoplot_r1_median_readlength_raw, nanoplot_r1_stdev_readlength_raw, nanoplot_r1_n50_raw, nanoplot_r1_median_q_raw, nanoplot_r1_est_coverage_raw, nanoplot_r1_median_readlength_clean, nanoplot_r1_stdev_readlength_clean, nanoplot_r1_n50_clean, nanoplot_r1_median_q_clean and nanoplot_r1_est_coverage_clean.
TheiaCoV Flu Track
- All of the flu-specific tasks now live in their own sub-workflow, flu_track. This has no effect on the end-user.
- In TheiaCoV_ONT, flu samples will now have both the HA and NA segment’s assembly mean coverage appear in the assembly_mean_coverage output variable. This reflects the behaviour already present on TheiaCoV_Illumina_PE.
- The all-segments FASTA header lines now include samplename.
- The new output irma_subtype_notes now indicates if IRMA was able to determine the flu subtype
- All workflows now uses abricate_flu_subtype (instead of irma_subtype) for selecting the appropriate nextclade_dataset_tag.
- Nextclade outputs columns for flu now explicitly state either HA or NA.
- Padded assemblies, where - or . present in the final assembly file are either removed or replaced by N (respectively), are now being provided to MAFFT and VADR to prevent task failures.
Terra_2_NCBI
- Skipping BioSample submission via the skip_biosample optional now skips the requirement to have BioSample metadata in your data table.
Augur_Prep_PHB and Augur_PHB
- RSV-A and RSV-B can now be analyzed with the Augur workflows.
- Metadata no longer required to run Augur. Only a distance tree will be created if metadata is not provided.
kSNP3 and other phylogenetic inference workflows
- Outputs from phylogenetic workflows (SNP matrices) and the summarize_data task will now have a properly toggleable Phandango coloring suffix.
- The phandango_coloring optional input is now off by default.

Docker container updates:

IRMA has been updated to version v1.1.5
AMRFinderPlus has been updated to version v3.12.8-2024-05-02.2
ts_mlst database has been updated as of 2024-06-01
Pangolin database has been updated to pdata v1.27

🐛 Bug fixes and small improvements:

TheiaProk_ONT and TheiaProk_FASTA: Hicap was being run in TheiaProk_ONT but the outputs were never appearing in the data table! This has been fixed.
All TheiaCoV workflows: Unsupported organisms will no longer cause workflow failures.
Terra_2_NCBI: Fixed a typo when using the Wastewater Biosample package that was causing an error.
Freyja_Dashboard: The freyja_dasbhoard output variable now correctly says freyja_dashboard.
Workflows that accept String inputs that are used to name things: Several input variables such as cluster_name now accept Strings with whitespace.
All workflows: Runtime parameters have been adjusted for several tasks.
TheiaCoV Flu Track: A bug has been fixed for IRMA running out of disk space. Additionally, another bug affecting Flu B samples was fixed related to empty HA segment FASTA files.

What's Changed

TheiaCoV wf support for RSV - run nextclade by default and small optimizations (kraken_target_organism, genome_length) by @kapsakcj in #436
[New workflow - internal] Gambitcore for assembly quality assessment with GAMBIT by @cimendes in #466
[TheiaProk_ONT and TheiaCoV_ONT] Expose additional QC metrics from nanoplot for both raw and clean reads by @cimendes in #452
Exposing r1 and r2 mean_q_clean and mean_readlength_clean by @jrotieno in #455
[TheiaProk_ONT] add patch fix to kmc estimated genome size to not go over 10Mbp by @cimendes in #459
Add abricate as optional module by @jrotieno in #431
[TheiaProk_ONT] Add Kraken2 as part of read_qc by @cimendes in #438
[Flu] Assembly mean coverage & read screen clean-up by @sage-wright in #469
[Freyja_Dashboard] fix typo in freyja_dashboard output File variable name by @AndrewLangvt in #482
[Terra_2_NCBI] remove metadata requirements with skip_biosample == true by @sage-wright in #475
Augur Updates for RSV-A and RSV-B by @jrotieno in #478
[kSNP3] fix behaviour when phandango colouring is set to false by @cimendes in #496
[Internal] Updating runtime parameters by @sage-wright in #494
Automatically convert spaces to dashes in workflows that accept strings by @AndrewLangvt in #498
[TheiaCoV] Enable user to run TheiaCoV with an unsupported organism by @sage-wright in #501
[AMRFinderPlus] parse BETA-LACTAM genes and subclasses into individual output columns by @sage-wright in #505
IRMA bug fixes & improvements; theiacov_illumina_pe wf updates for Flu by @kapsakcj in #468
Augur_PHB: Set sample_metadata_tsvs input to optional by @jrotieno in #503
[Internal - Gambitcore] Downgrade database to stable 1.3.0 version by @cimendes in #473
[TheiaCoV_Illumina_PE & _ONT] Create sub-workflow for flu-specific modules by @sage-wright in #502
[TheiaProk] Add abricate module for vibrio characterization by @cimendes in #429
[TheiaProk] expose hicap outputs in theiaprok_fasta and theiaprok_ont by @cimendes in #508
Fix typo in Terra_2_NCBI Wastewater metadata by @michellescribner in #519
[TheiaProk] Update amrfinderplus to v3.12.8; DB: v2024-05-02.2; reduce compute resources by @kapsakcj in #514
[TheiaProk] upgrade mlst docker image to 2024-06-01 staphb build; reduced runtime parameters; enable preemptible by @kapsakcj in #516
update default...

Contributors

kapsakcj, AndrewLangvt, and 4 other contributors

Assets 2

Releases: theiagen/public_health_bioinformatics

v4.1.0

This release expands automated quality control to new workflows and includes influenza segment thresholds; standardizes adapter, primer, and read trimming; and updates software versions. Documentation updates and various bug fixes are also implemented.

🚀 Changes to existing workflows

All Consensus Assembly Workflows

All Illumina workflows

All Viral Workflows

All TheiaCoV, all TheiaProk, and TheiaEuk Illumina PE workflows

All TheiaCoV, all TheiaProk, Freyja, tbprofiler-tNGS, and TheiaEuk Illumina workflows

All TheiaProk workflows

All TheiaViral workflows

All TheiaViral Illumina workflows

Freyja Workflows

TheiaCoV Illumina PE and TheiaCoV ONT

TheiaViral Illumina PE

All Phylogenetic workflows

Snippy_Tree

BaseSpace_Fetch

📚 Documentation updates

What's Changed

New Contributors

Contributors

Uh oh!

v4.0.0

Public Health Bioinformatics v4.0.0 Major Release Notes

This release adds three new workflows, reworks the organism-specific characterization logic in TheiaCoV and TheiaViral, and makes significant improvements to many workflows. Documentation updates and various bug fixes have also been implemented.

🆕 New Workflows

🚀 Changes to existing workflows

📚 Documentation updates

What's Changed

Contributors

Uh oh!

v3.1.1

Public Health Bioinformatics v3.1.1 Patch Release Notes

🩹 This patch release reverts ts_mlst default behavior to pre-v3.1.0 status, enables specific use-case ts_mlst input options, alongside other minor changes

What's Changed

Contributors

Uh oh!

v3.1.0

Public Health Bioinformatics v3.1.0 Minor Release Notes

This minor release adds four new workflows - TheiaViral_Illumina_PE, TheiaViral_ONT, TheiaEuk_ONT, and Terra_2_ENA. Documentation updates and various bug fixes have also been implemented.

🆕 New workflows

🚀 Changes to existing workflows

📚 Documentation Updates

What's Changed

Contributors

Uh oh!

v3.0.1

Public Health Bioinformatics v3.0.1 Patch Release Notes

🩹 This patch release fixes several bugs that were identified

What's Changed

Contributors

Uh oh!

v3.0.0

Public Health Bioinformatics v3.0.0 Major Release

This major release adds four new workflows, updates the assembly algorithm for TheiaProk_ONT, and makes significant improvements to many workflows. Documentation updates and various bug fixes have also been implemented.

🆕 New workflows:

🚀 Changes to existing workflows:

📖 Documentation updates

What's Changed

Contributors

Uh oh!

v2.3.0

Public Health Bioinformatics v2.3.0 Minor Release

This minor release adds two new workflows, Fetch_SRR_Accession_PHB and Concatenate_Illumina_Lanes_PHB, and makes significant improvements to the TheiaCoV, TheiaEuk, TheiaProk, and TheiaMeta workflow series. Documentation updates and various bug fixes have also been implemented.

🆕 New workflows

🚀 Changes to existing workflows

📚 Documentation Updates

What's Changed

Contributors

Uh oh!

v2.2.1

Public Health Bioinformatics v2.2.1 Patch Release Notes

🩹 This patch release fixes the output names for the NCBI-Scrub standalone workflows.

What's Changed

Contributors

Uh oh!

v2.2.0

Public Health Bioinformatics v2.2.0 Minor Release Notes

This minor release adds two new workflows, Create_Terra_Table_PHB and Snippy_Streamline_FASTA_PHB, and makes significant improvements to the TheiaProk, TheiaCoV, TheiaMeta, and Freyja workflow series. Additionally, several bug fixes have been made.

🩹 This patch release reverts `ts_mlst` default behavior to pre-v3.1.0 status, enables specific use-case `ts_mlst` input options, alongside other minor changes