Skip to content

Releases: jtamames/SqueezeMeta

v1.7.2

14 Jun 06:32

Choose a tag to compare

SqueezeMeta

Compatibility notes

We have modified part of the SqueezeMeta.pl interface to make it more internally consistent

  • extassembly and extbins are now treated as assembly modes (together with coassembly, merged, seqmerge and sequential) instead of optional arguments. This aims to avoid the previous scenario in which an assembly mode had to be provided in the command line in order for -extassembly or -extbins to work, even if the pipeline was not actually going to conduct any assembly. An extra parameter -r|-reference can now be used to specify the path to the pre-existing assembly / bin collection
    • -m extassembly -r contigs.fasta replaces the old syntax -m coassembly -extassembly contigs.fasta
    • -m extbins -r bin_directory/ replaces the old syntax -m coassembly - extbins bin_directory/
    • The old syntax will still work properly (so no need to change your scripts) but will emit a warning
  • -p is no longer a mandatory parameter in SqueezeMeta.pl, and instead it defaults to SQM
  • -m sequential will now accept -p, same as the other modes. When running in the sequential mode, the directory specified in -p will first be created, and then populated with the results for the different samples. Before, the results were written directly in the working directory
    • Using -m sequential without -p will still work, but the results will be written into the ./SQM directory instead of the working directory

New features

  • Added the --fastnr flag to sqm_reads.pl and sqm_longreads.pl, which in turn will pass the --fast flag to DIAMOND when running classification against the nr database. This is significantly faster at the expense of some accuracy,

Minor changes / bugfixes

  • Fixed a bug in which step 10 could consume an excessive amount of memory in projects with a high number of samples
  • Fixed a bug resulting in duplicate entries being present in the orftable
  • Fixed a bug in which restarting a project would call the scripts from the version with which the project was created, instead of those from the current version, if the old version was installed in a different path
  • Fixed a bug in which SqueezeMeta.pl would emit a warning but now die when supplied with wrong arguments (e.g. due to a typo when writing the argument name)
  • Fixed a bug in which sqm_reads.pl and sqm_longreads.pl would fail if -p was an absolute path
  • Fixed a bug in sqm2tables.py in which the first ORF of the project would not receive a proper taxonomic assignment
  • Fixed a bug in sqm2tables.py when SqueezeMeta was run with the -D mode

SQMtools

New features

  • Added bindings to convert SQMtools data into microeco and phyloseq objects, enabling the downstream analysis of SqueezeMeta results with both packages. See details here

Minor changes / bugfixes

  • subsetTax now can be used to select more than one taxon at the same time, provided they share the same rank
  • exportPathway now uses group medians instead of group means for log2FC calculation
  • exportPathway now returns a ggplot object
  • GTDB taxonomy, if present, will have its own table under SQM$bins$tax_gtdb
  • We now use BioStrings to store DNA/AA sequences, which should reduce memory footprint
  • Fixed a bug introduced in 1.7.0 in which subsetSamples would throw an error
  • Fixed a bug in which bin abundances were always recalculated upon subset/combine, even if recalculate_bin_stats was set to FALSE
  • Fixed a bug in which running plotFunctions and exportPathways with count="percent" on a subsetted object would renormalize data to 100%, instead of calculating the percentages on the original number of reads before subsetting. Added the rescale_percent flag to revert to the old behaviour if needed

v1.7.0 Birds of a feather

31 Mar 21:12

Choose a tag to compare

SqueezeMeta

Compatibility note

SqueezeMeta will now expect the CheckM2 database to be present in its database directory. If you had downloaded the SqueezeMeta database before, you can just download that extra file from here (make sure to uncompress it too!)

New features

  • We have revamped all the documentation and moved it to Read The Docs! We will no longer provide a PDF version of the documentation
  • SqueezeMeta can now be used to annotate a set of pre-existing genomes/bins and quantify their abundance in different samples. A directory containing genomes/bins can be provided through the -extbins parameter, tho will run the pipeline on a pre-existing set of bins/genomes. This is similar to what -extassembly would do with a single FASTA file, but will treat each FASTA file in the input directory as a different bin
  • SqueezeMeta can now be used to quickly obtain bins from metagenomes, skipping the taxonomic/functional annotation of contigs and ORFs. We have added the --onlybins flag to SqueezeMeta.pl, in order to quickly perform assembly, binning and bin QC/annotation
  • SqueezeMeta can now optionally run GTDB-Tk for the taxonomic classification of bins, if the --gtdbtk flag is provided when calling the pipeline. Note that we do not redistribute the GTDB-Tk databases and they must be obtained separately. By default we expect them to be in a directory named gtdb inside the SqueezeMeta database directory, but a custom location can be provided via the -gtdbtk_data_path argument
  • Switched to using CheckM2 for the calculation of bin completeness/contamination. This gets rid of several bugs related to CheckM1 not having updated its taxonomy to the current standard (e.g. "Pseudomonadota" instead of "Proteobacteria"). As a consequence, a strain heterogenity is no longer available in the bin results (though we've left an empty column there for backwards compatibility reasons)
  • -taxbinmode has been deprecated, as GTDB-Tk can provide better bin-level taxonomies
  • Added the --fastnr flag, which in turn will pass the --fast flag to DIAMOND when running classification against the nr database in Step 4 of the pipeline. This is significantly faster at the expense of some accuracy, but didn't seem to change the results significantly in our test.
  • We have simplified the way we calculate disparity for contig and bins, see details here
  • sqm2tables.py is now called at the end of SqueezeMeta runs
  • We're moving towards using conda packages rather than vendoring SqueezeMeta's dependencies, see details here

Minor changes / bugfixes

  • Contig names and bin names now start with the project name, to make it easy to distinguish contigs/bins coming from different SqueezeMeta runs
  • Added read group tags identifying the sample from which the reads come from to the BAM files produced in step 10
  • Removed the make_databases_alt.pl and configure_nodb_alt.pl scripts, as the standard make_databases.pl, download_databases.pl and configure_nodb.pl scripts now are able to switching to a mirror if our server is unavailable
  • Added the -g parameter which will control the value of the -g|--global-ranking parameter in DIAMOND when running it against the nr database
  • Use forking instead of threads in scripts 06 and 10 to reduce memory usage when multithreading
  • Fixed a bug that prevented sqm_hmm_reads.pl from working since it was trying to download legacy PFAM databases that are no longer reachable
  • Fixed a bug in which some ORFs would be duplicated if the pipeline went through step 13 on restart
  • Fixed the calculation of present pathways in step 20
  • Fixed a bug preventing SqueezeMeta to work with newer versions of MetaBAT2
  • Several bugfixes to SqueezeMeta's behaviour when restarting a run
  • We now use the scaffolds.fasta result instead of the contigs.fasta one when running SPAdes with the -a spades or -a spades-base (we still use the transcripts.fasta result if running it with the -a rnaspades mode
  • Fixed a bug in which sqm_annot.pl wasn't passing the right number of threads to subprocesses
  • Fixed a bug in step 10 when the total number of contigs was smaller than the available threads

SQMtools

New features

  • We have revamped all the documentation and moved it to Read The Docs! A PDF version will still be present as part of the CRAN release
  • SQMtools now supports loading more than one project into the same object. loadSQM can now be used to load the output of different SqueezeMeta runs into a single object that can be subsetted and plotted as a standard SQM object (see details here. This facilitates the analysis of e.g. sequential runs in which each sample was processed independently
  • We now provide basic functions for defining/modifying/curating bins within SQMtools, and the possibility of recalculating bin completeness/contamination after adding/removing contigs to the bin (either manually or through a subset function). See details here and here
  • Added exportContigs, exportORFs and exportBins to export the sequences present in a SQM or SQMbunch object
  • We changed the default way of calculating copy numbers from using RecA as a reference to using the median coverage of 10 Universal Single Copy Genes. This behaviour can be controlled via the single_copy_genes parameter in loadSQM

Minor changes / bugfixes

  • Added the load_sequences argument to loadSQM to control whether contig/ORF sequences should be loaded. Setting it to FALSE will decrease memory usage
  • Added an output_dir parameter to exportPathway
  • Start and end positions of ORFs are now tracked explicitly in SQM$orfs$table
  • copy_number is now the default quantification method used by plotFunctions and exportPathway, when available.
  • Fixed some IDs missing from SQM names and paths vectors after running combineSQMlite
  • Fixed a bug in which the data.table package wasn't attached when loading SQMtools
  • Fixed a bug when subsetting was attempted with only one ORF/contig

v1.6.5post1

06 Jan 12:04

Choose a tag to compare

  • Changed conda package recipe to depend only on the conda-forge and bioconda channels.

NOTE: This release is broken in GitHub, but its conda package is ok.

So if you are using conda, you don't have to worry. If you want to use the source code directly please use 1.6.5 instead, as the only changes here are related to conda packaging.

v1.6.5

08 Aug 09:01

Choose a tag to compare

  • Fix a bug in which --cleaning would only use one pair of files per sample, even if more were specified in the samples file.
  • Fixed a bug in which the pipeline would stop with an error at step 10 if the number of mapped reads was too low.
  • This is a fast release aimed to fix a couple of bugs. We have not updated SQMtools or the PDF manual so they both reflect version 1.6.3.

v1.6.4

07 Jul 07:03

Choose a tag to compare

  • This changes the way that bin disparity is calculated. Now it will be simply the ratio of contigs disagreeing with the consensus taxonomy. This is faster and leads to comparable results overall. This also fix an issue in which very large bins (such as eukaryotic bins) may consume a lot of memory during step 16.
  • This is a fast release aimed to fix a single bug. We have not updated SQMtools or the PDF manual so they both reflect version 1.6.3.

v1.6.3

20 Sep 07:54

Choose a tag to compare

  • Conda installations will now prioritize conda binaries instead of the vendored ones in some cases. This will hopefully fix certain issues in which SqueezeMeta was failing on certain distributions/versions.
  • test_install.pl now performs additional tests to check that binaries can be executed in the current environment.
  • Increased speed and reduced memory usage in step 10 (read counting).
  • Fixed an error in which projects created with the sequential mode would fail to restart. Note that each sample still has to be restarted individually.
  • Fixed an error in which step 16 (DAStool bin merging) would be attempted even if the --nobins flag was provided.
  • SQMtools: fixed an error in exportPathways when the requested KEGG map had only arrows.
  • SQMtools: fixed an error in which figures would not generated properly when `count='percent' was selected if any sample had 0 reads (as could happen when analyzing subsets).

v1.6.2post3

12 Jul 16:13

Choose a tag to compare

  • Update SPAdes to 3.15.5 so it works with python 3.10

v1.6.2post2

11 Jul 16:15

Choose a tag to compare

  • Upgrade to python 3.10 and improve conda packaging, hopefully fix #705 and be more future-proof

v1.6.2post1

03 May 18:26

Choose a tag to compare

  • Fix an issue in which pysam was not properly installed when installing SqueezeMeta through conda

v1.6.2

21 Mar 17:29
0647985

Choose a tag to compare

New features

  • Added spades-base as a possible assembler for SqueezeMeta. This will make SqueezeMeta call SPAdes with no additional flags. Flags for SPAdes can then customized by the user by passing --assembly_options "EXTRA OPTIONS" when calling SqueezeMeta. More information can be found in the ReadMe and the PDF manual.
  • Added the utility script sqm2zip.py, which allows to pack the essential files from a SqueezeMeta project into a single zip file.
  • SQMtools: loadSQM can now load a project directly from a zip file created by sqm2zip.py (syntax would be `loadSQM("/path/to/my_project.zip").
  • SQMtools: SQMtools is now available in CRAN and can be installed with install.packages("SQMtools") in Windows, Mac and Linux computers.
  • These changes are meant to allow users to easily transfer their data from their clusters/workstations to their personal computers and explore their results there.
  • SQMtools: mostAbundant and mostVariable now accept the argument bycol = TRUE, which will make these functions operate on columns rather than rows.

Minor changes / bugfixes

  • We now use coverage variances in addition to average contig coverages when calling metabat2, which should improve the quality of the resulting bins.
  • Mapping results are now stored as BAM files instead of SAM files, which should reduce disk usage.

Known issues / Other announcements

  • The make_databases.pl script may spend a lot of time in the "Creating SQLite databases" step. We have included a patch to improve this, but still it happens inconsistently (taking a few hours in some systems, and several days in others). Having a lot (1-2 Tb) of free disk space may help. download_databases.pl should be considered as the preferred way of quickly getting reasonably-up-to-date databases.
  • We are discontinuing official support for CentOS7, as its default libraries are too outdated now. We plan on supporting SqueezeMeta in Debian, WSL2-Ubuntu and (hopefully) CentOS Upstream in the not so distant future.