v2.2.0 (01/06/2026)
COMMAND CHANGE:
- Due to deprecation of
-entrysince nextflowv24.10.0we switched to the use of--modeto run specific workflowsPHOENIX,CDC_PHOENIXetc. this parameter is case insensitive.
Implemented Enhancements:
- Creation of
--mode UPDATE_PHOENIXto take in a phoenix directory (runs all samples in dir) or a samplesheet (with format "sample,dir") and update MLST and AR calls. Files will be overwritten inplace and a "${samplename}_updater_log.tsv" file will be created the first time this is run and will be updated everytime it is run there after. This file will contain a record of the what was updated and when. --create_ncbi_sheetnow creates separate excel sheets for each BioProject (if there is more than one in your run) to make upload to NCBI easier.- Updating the big 5 genes to be highlighed, particularly OXA genes has become too big of lift to hard code so the BLDB databased was added to reference and the process is described in wiki.
- To reduce the space needed to save phx output,
*.kraken2_trimd.classifiedreads.txtand*.kraken2_wtasmbld.classifiedreads.txtwere removed from phx output. If you need or want these files you can get them from the workdir for the process(es)KRAKEN2_TRIMDandKRAKEN2_ASMBLD. Alternatively, you can create your own config file and add back in the publishing of the files like this - Improved linking of Taxonomy across modules. Use of NCBI TaxID in ANI,Assembly_Ratio, GC_Content allows for more standardized comparisons across tools
- Expanded available taxonomy for MLST SRST2 to match the expansion of the pubMLST and other MLST databases.
- MLST profile output now merges novel alleles into a single profile (e.g. MLST and SRST2 both find a novel allele at the rpoB loci then the out put would show rpoB(12*,33~)) instead of showing 2 separate lines/profiles
- Code base reductions:
- Condensing GENERATE_PIPELINE_STATS modules and subworkflows.
- GRiPHin module was rewritten to be nextflowly (i.e. module runs off input files rather than a directory). Thanks to Savannah Linen (@ztb2), Andreea Stoica (@astoicame) and Les Kallestad (@lekalle) for their help with this.
- Removed DETERMINE_TAXA_ID_FAILURE, CREATE_SUMMARY_LINE_FAILURE and GENERATE_PIPELINE_STATS_FAILURE_EXQC modules, by condensing them into DETERMINE_TAXA_ID, CREATE_SUMMARY_LINE and GENERATE_PIPELINE_STATS_FAILURE respectively.
- Haemophilus influenzae and Bordetella pertussis added as possible taxa to pass to AMRFinder with
--organism. Burkholderia mallei moved from Burkholderia pseudomallei complex set and is just run as Burkholderia mallei.
Summary File Changes:
- For spades failures, lack of reads after trimming or corruption we simplifed the warnings produced in
GRiPHin.pyby supressing other warnings as the root cause is the aforementioned failures. Similarly, if the reason for the Auto QC Failure is "Assembly file not found" then only that is reported rather than listing files with unknowns. - New columns added to GRiPHin summary files:
PHX_Version,Final_Taxa_IDandShigaPass_Organism - For alignment across GRiPHin summary files and
Phoenix_Summary.tsvin the latter the columnsFinal_Taxa_IDandShigaPass_Organismwere added. Additionally, theSpeciescolumn was changed toFastANI_Organism,Taxa_ConfidencetoFastANI_%ID, andTaxa_CoveragetoFastANI_%Coverage. - AMRFinder files
Terra.bio Output Updates:
- Columns are now reported based on
*_GRiPHin_Summary.tsvexcept for the columnsBETA_LACTAM_RESISTANCE_GENES,OTHER_AR_GENES,AMRFINDER_POINT_MUTATIONS,HYPERVIRULENCE_GENESandPLASMID_INCOMPATIBILITY_REPLICONSstill come from thePhoenix_Summary.tsvfile.MLST1_NCBIandMLST2_NCBIadded columns, which are a combination of the MLSTs and MLST_SCHEMEs columns. These new columns are formated for uploading to NCBI following ARLN guidance.SHIGAPASS_TAXAis the output of Shigapass if it was run. TheTAXA_SOURCEcolumn will state if Shigapass was used for the final taxa call.FINAL_TAXA_IDis the final taxa call for the isolate.N50is the N50 fromQuast.WARNINGS_COUNTwas changed toWARNINGSand it is print out of the warnings, rather than just a count.- AMRFinderPlus genes are now reported in the columns
AMRFINDERPLUS_AMR_CLASSES,AMRFINDERPLUS_AMR_CORE_GENES,AMRFINDERPLUS_AMR_PLUS_GENES,AMRFINDERPLUS_AMR_SUBCLASSES,AMRFINDERPLUS_STRESS_GENESandAMRFINDERPLUS_VIRULENCE_GENES. - To reduce the space needed to save phx output,
*.kraken2_trimd.classifiedreads.txtand*.kraken2_wtasmbld.classifiedreads.txtare no longer output from PHX.*.kraken2_asmbld.classifiedreads.txtwas added as an output as taxids are in that file, which is different from the*.kraken2_wtasmbld.classifiedreads.txt. These files aren't really needed expect for edge cases such as questions about conflicting results or investigating suspected contamination.
- Due to deprecation of "When" block in nextflow
whenstatements in modules were removed and.filter{}is used throughout instead. Thanks to Savannah Linen (@ztb2), Andreea Stoica (@astoicame) and Les Kallestad (@lekalle) for their help implimenting this.
Fixed Bugs:
- Taxonomy Fixes:
- BAD BUG!!
sort_and_prep_dist.shwas not evaluating scientific notation so in some cases exact matches were not being reported. For context, when reviewing our dataset of 71,670 samples, 2,837 (~4%) had scientific notation in their mash distances, 31 (0.04%) have different species if you sort with the scientific notation compared to what PHX was originally reported. All of the 31 would be considered in the same complex, e.g. E. hormaechei/E. cloacae, E. coli/Shigella, K. michiganensis/K. oxytoca. Thus, the impact of previously reported taxa isn't expected to be large, but this fix could maybe resolve differences reported between MALDI/WGS. - Shigapass was added to distinguish correctly between E. coli/Shigella. If FastANI determines the species to be either E. coli or Shigella Shigapass will now run to confirm the call. In GRiPHin there is a new
Final_Taxa_IDcolumn that has the final determined call. The columnTaxa_sourcewill still sayANI_REFSEQif the FastANI call was kept and now will haveShigapassif the FastANI call was determined to be wrong by Shigapass and was thus overwritten. This was added to all entry points.
- BAD BUG!!
- Changes were made to allow
-resumeto work better. - More robust checks in PHoeNIx to pull in only sample_names correctly.
- Fixed error that caused the column "No_AR_Genes_Found" to not appear in the GRiPHin report.
- Fix for
--coveragebeing converted to a string when run on Seqera Cloud. Thanks to @DOH-JDJ0303 for the PR. - Changes to genes were highlighted in teh GRiPHin_Summary:
- Beta-Lactamase DataBase (BLDB) is now used as an input to determine which genes to highlight rather than hard coding. The big-5 genes that have their function labelled as ESBL/IR/IR ESBL were removed from being highlighted as part of the
big 5genes as are not thought to have carbapenemase acvitity. - Full details on highlighing methods found in the wiki
- Beta-Lactamase DataBase (BLDB) is now used as an input to determine which genes to highlight rather than hard coding. The big-5 genes that have their function labelled as ESBL/IR/IR ESBL were removed from being highlighted as part of the
- The column
Kraken_ID_Raw_Reads_%in the GRiPHin summary files (xlsx and tsv) was changed toKraken_ID_Trimmed_Reads_%to accurately reflect what that column has been reporting... whoopsie. - Fixed bug where passing samples were not entering BBDuk step due to forward/reverse being in the file name rather than R1/R2.
Container Updates:
- Containers updated to include developers bug fixes:
- amrfinderplus: v3.12.8 to v4.2.5
- busco: v5.4.7--pyhdfd78af_0 to v6.0.0
- bbtools: v39.01 to v39.13
- spades: v3.15.5 to v4.2.0
- quast: v5.0.2 to v5.3.0
- sra-tools: v3.1.1 to v3.2.0--h4304569_0
- entrez-direct: v16.2--he881be0_1 to v24.0--he881be0_0
- MLST: v2.23.0_07282023 to v2.25.0_12312025
- phx_base: python upgraded from 3.7.12 to 3.12.3, base image updated from jammy to 24.04.
Database Updates:
- Curated AR gene database was updated on 2025-12-08 (yyyy-mm-dd) to include the new AMRFinder database:
- AMRFinderPlus database
- Version 2025-12-03.1
- ResFinder
- Notably, NDM-58 and 60 were added. See history.txt file for more details (for this new version changes from 2024-12-13 to 2025-09-09 are included).
- ARG-ANNOT hasn't changed since last version release.
- AMRFinderPlus database
- MLST database is now created using pubMLST API and merged with the unique schemes available on pasteur and enterobase sites
- Numerous new schemes were added including a significant group that now contain more than a single scheme for an organism