Skip to content

v2.2.0

Latest

Choose a tag to compare

@HAISeq HAISeq released this 07 Jan 06:44
· 2 commits to main since this release

v2.2.0 (01/06/2026)

Full Changelog

COMMAND CHANGE:

  • Due to deprecation of -entry since nextflow v24.10.0 we switched to the use of --mode to run specific workflows PHOENIX, CDC_PHOENIX etc. this parameter is case insensitive.

Implemented Enhancements:

  • Creation of --mode UPDATE_PHOENIX to take in a phoenix directory (runs all samples in dir) or a samplesheet (with format "sample,dir") and update MLST and AR calls. Files will be overwritten inplace and a "${samplename}_updater_log.tsv" file will be created the first time this is run and will be updated everytime it is run there after. This file will contain a record of the what was updated and when.
  • --create_ncbi_sheet now creates separate excel sheets for each BioProject (if there is more than one in your run) to make upload to NCBI easier.
  • Updating the big 5 genes to be highlighed, particularly OXA genes has become too big of lift to hard code so the BLDB databased was added to reference and the process is described in wiki.
  • To reduce the space needed to save phx output, *.kraken2_trimd.classifiedreads.txt and *.kraken2_wtasmbld.classifiedreads.txt were removed from phx output. If you need or want these files you can get them from the workdir for the process(es) KRAKEN2_TRIMD and KRAKEN2_ASMBLD. Alternatively, you can create your own config file and add back in the publishing of the files like this
  • Improved linking of Taxonomy across modules. Use of NCBI TaxID in ANI,Assembly_Ratio, GC_Content allows for more standardized comparisons across tools
  • Expanded available taxonomy for MLST SRST2 to match the expansion of the pubMLST and other MLST databases.
  • MLST profile output now merges novel alleles into a single profile (e.g. MLST and SRST2 both find a novel allele at the rpoB loci then the out put would show rpoB(12*,33~)) instead of showing 2 separate lines/profiles
  • Code base reductions:
    • Condensing GENERATE_PIPELINE_STATS modules and subworkflows.
    • GRiPHin module was rewritten to be nextflowly (i.e. module runs off input files rather than a directory). Thanks to Savannah Linen (@ztb2), Andreea Stoica (@astoicame) and Les Kallestad (@lekalle) for their help with this.
    • Removed DETERMINE_TAXA_ID_FAILURE, CREATE_SUMMARY_LINE_FAILURE and GENERATE_PIPELINE_STATS_FAILURE_EXQC modules, by condensing them into DETERMINE_TAXA_ID, CREATE_SUMMARY_LINE and GENERATE_PIPELINE_STATS_FAILURE respectively.
  • Haemophilus influenzae and Bordetella pertussis added as possible taxa to pass to AMRFinder with --organism. Burkholderia mallei moved from Burkholderia pseudomallei complex set and is just run as Burkholderia mallei.

Summary File Changes:

  • For spades failures, lack of reads after trimming or corruption we simplifed the warnings produced in GRiPHin.py by supressing other warnings as the root cause is the aforementioned failures. Similarly, if the reason for the Auto QC Failure is "Assembly file not found" then only that is reported rather than listing files with unknowns.
  • New columns added to GRiPHin summary files: PHX_Version, Final_Taxa_ID and ShigaPass_Organism
  • For alignment across GRiPHin summary files and Phoenix_Summary.tsv in the latter the columns Final_Taxa_ID and ShigaPass_Organism were added. Additionally, the Species column was changed to FastANI_Organism, Taxa_Confidence to FastANI_%ID, and Taxa_Coverage to FastANI_%Coverage.
  • AMRFinder files

Terra.bio Output Updates:

  • Columns are now reported based on *_GRiPHin_Summary.tsv except for the columns BETA_LACTAM_RESISTANCE_GENES, OTHER_AR_GENES, AMRFINDER_POINT_MUTATIONS, HYPERVIRULENCE_GENES and PLASMID_INCOMPATIBILITY_REPLICONS still come from the Phoenix_Summary.tsv file.
    • MLST1_NCBI and MLST2_NCBI added columns, which are a combination of the MLSTs and MLST_SCHEMEs columns. These new columns are formated for uploading to NCBI following ARLN guidance.
    • SHIGAPASS_TAXA is the output of Shigapass if it was run. The TAXA_SOURCE column will state if Shigapass was used for the final taxa call.
    • FINAL_TAXA_ID is the final taxa call for the isolate.
    • N50 is the N50 from Quast.
    • WARNINGS_COUNT was changed to WARNINGS and it is print out of the warnings, rather than just a count.
    • AMRFinderPlus genes are now reported in the columns AMRFINDERPLUS_AMR_CLASSES, AMRFINDERPLUS_AMR_CORE_GENES, AMRFINDERPLUS_AMR_PLUS_GENES, AMRFINDERPLUS_AMR_SUBCLASSES, AMRFINDERPLUS_STRESS_GENES and AMRFINDERPLUS_VIRULENCE_GENES.
    • To reduce the space needed to save phx output, *.kraken2_trimd.classifiedreads.txt and *.kraken2_wtasmbld.classifiedreads.txt are no longer output from PHX. *.kraken2_asmbld.classifiedreads.txt was added as an output as taxids are in that file, which is different from the *.kraken2_wtasmbld.classifiedreads.txt. These files aren't really needed expect for edge cases such as questions about conflicting results or investigating suspected contamination.
  • Due to deprecation of "When" block in nextflow when statements in modules were removed and .filter{} is used throughout instead. Thanks to Savannah Linen (@ztb2), Andreea Stoica (@astoicame) and Les Kallestad (@lekalle) for their help implimenting this.

Fixed Bugs:

  • Taxonomy Fixes:
    • BAD BUG!! sort_and_prep_dist.sh was not evaluating scientific notation so in some cases exact matches were not being reported. For context, when reviewing our dataset of 71,670 samples, 2,837 (~4%) had scientific notation in their mash distances, 31 (0.04%) have different species if you sort with the scientific notation compared to what PHX was originally reported. All of the 31 would be considered in the same complex, e.g. E. hormaechei/E. cloacae, E. coli/Shigella, K. michiganensis/K. oxytoca. Thus, the impact of previously reported taxa isn't expected to be large, but this fix could maybe resolve differences reported between MALDI/WGS.
    • Shigapass was added to distinguish correctly between E. coli/Shigella. If FastANI determines the species to be either E. coli or Shigella Shigapass will now run to confirm the call. In GRiPHin there is a new Final_Taxa_ID column that has the final determined call. The column Taxa_source will still say ANI_REFSEQ if the FastANI call was kept and now will have Shigapass if the FastANI call was determined to be wrong by Shigapass and was thus overwritten. This was added to all entry points.
  • Changes were made to allow -resume to work better.
  • More robust checks in PHoeNIx to pull in only sample_names correctly.
  • Fixed error that caused the column "No_AR_Genes_Found" to not appear in the GRiPHin report.
  • Fix for --coverage being converted to a string when run on Seqera Cloud. Thanks to @DOH-JDJ0303 for the PR.
  • Changes to genes were highlighted in teh GRiPHin_Summary:
    • Beta-Lactamase DataBase (BLDB) is now used as an input to determine which genes to highlight rather than hard coding. The big-5 genes that have their function labelled as ESBL/IR/IR ESBL were removed from being highlighted as part of the big 5 genes as are not thought to have carbapenemase acvitity.
    • Full details on highlighing methods found in the wiki
  • The column Kraken_ID_Raw_Reads_% in the GRiPHin summary files (xlsx and tsv) was changed to Kraken_ID_Trimmed_Reads_% to accurately reflect what that column has been reporting... whoopsie.
  • Fixed bug where passing samples were not entering BBDuk step due to forward/reverse being in the file name rather than R1/R2.

Container Updates:

  • Containers updated to include developers bug fixes:

Database Updates:

  • Curated AR gene database was updated on 2025-12-08 (yyyy-mm-dd) to include the new AMRFinder database:
  • MLST database is now created using pubMLST API and merged with the unique schemes available on pasteur and enterobase sites
    • Numerous new schemes were added including a significant group that now contain more than a single scheme for an organism