Releases · jtamames/SqueezeMeta

14 Jun 06:32

fpusan

v1.7.2

0236b40

v1.7.2 Latest

Latest

SqueezeMeta

Compatibility notes

We have modified part of the SqueezeMeta.pl interface to make it more internally consistent

extassembly and extbins are now treated as assembly modes (together with coassembly, merged, seqmerge and sequential) instead of optional arguments. This aims to avoid the previous scenario in which an assembly mode had to be provided in the command line in order for -extassembly or -extbins to work, even if the pipeline was not actually going to conduct any assembly. An extra parameter -r|-reference can now be used to specify the path to the pre-existing assembly / bin collection
- -m extassembly -r contigs.fasta replaces the old syntax -m coassembly -extassembly contigs.fasta
- -m extbins -r bin_directory/ replaces the old syntax -m coassembly - extbins bin_directory/
- The old syntax will still work properly (so no need to change your scripts) but will emit a warning
-p is no longer a mandatory parameter in SqueezeMeta.pl, and instead it defaults to SQM
-m sequential will now accept -p, same as the other modes. When running in the sequential mode, the directory specified in -p will first be created, and then populated with the results for the different samples. Before, the results were written directly in the working directory
- Using -m sequential without -p will still work, but the results will be written into the ./SQM directory instead of the working directory

New features

Added the --fastnr flag to sqm_reads.pl and sqm_longreads.pl, which in turn will pass the --fast flag to DIAMOND when running classification against the nr database. This is significantly faster at the expense of some accuracy,

Minor changes / bugfixes

Fixed a bug in which step 10 could consume an excessive amount of memory in projects with a high number of samples
Fixed a bug resulting in duplicate entries being present in the orftable
Fixed a bug in which restarting a project would call the scripts from the version with which the project was created, instead of those from the current version, if the old version was installed in a different path
Fixed a bug in which SqueezeMeta.pl would emit a warning but now die when supplied with wrong arguments (e.g. due to a typo when writing the argument name)
Fixed a bug in which sqm_reads.pl and sqm_longreads.pl would fail if -p was an absolute path
Fixed a bug in sqm2tables.py in which the first ORF of the project would not receive a proper taxonomic assignment
Fixed a bug in sqm2tables.py when SqueezeMeta was run with the -D mode

SQMtools

New features

Added bindings to convert SQMtools data into microeco and phyloseq objects, enabling the downstream analysis of SqueezeMeta results with both packages. See details here

Minor changes / bugfixes

subsetTax now can be used to select more than one taxon at the same time, provided they share the same rank
exportPathway now uses group medians instead of group means for log2FC calculation
exportPathway now returns a ggplot object
GTDB taxonomy, if present, will have its own table under SQM$bins$tax_gtdb
We now use BioStrings to store DNA/AA sequences, which should reduce memory footprint
Fixed a bug introduced in 1.7.0 in which subsetSamples would throw an error
Fixed a bug in which bin abundances were always recalculated upon subset/combine, even if recalculate_bin_stats was set to FALSE
Fixed a bug in which running plotFunctions and exportPathways with count="percent" on a subsetted object would renormalize data to 100%, instead of calculating the percentages on the original number of reads before subsetting. Added the rescale_percent flag to revert to the old behaviour if needed

Assets 2

31 Mar 21:12

fpusan

v1.7.0

526fcab

v1.7.0 Birds of a feather

SqueezeMeta

Compatibility note

SqueezeMeta will now expect the CheckM2 database to be present in its database directory. If you had downloaded the SqueezeMeta database before, you can just download that extra file from here (make sure to uncompress it too!)

New features

We have revamped all the documentation and moved it to Read The Docs! We will no longer provide a PDF version of the documentation
SqueezeMeta can now be used to annotate a set of pre-existing genomes/bins and quantify their abundance in different samples. A directory containing genomes/bins can be provided through the -extbins parameter, tho will run the pipeline on a pre-existing set of bins/genomes. This is similar to what -extassembly would do with a single FASTA file, but will treat each FASTA file in the input directory as a different bin
SqueezeMeta can now be used to quickly obtain bins from metagenomes, skipping the taxonomic/functional annotation of contigs and ORFs. We have added the --onlybins flag to SqueezeMeta.pl, in order to quickly perform assembly, binning and bin QC/annotation
SqueezeMeta can now optionally run GTDB-Tk for the taxonomic classification of bins, if the --gtdbtk flag is provided when calling the pipeline. Note that we do not redistribute the GTDB-Tk databases and they must be obtained separately. By default we expect them to be in a directory named gtdb inside the SqueezeMeta database directory, but a custom location can be provided via the -gtdbtk_data_path argument
Switched to using CheckM2 for the calculation of bin completeness/contamination. This gets rid of several bugs related to CheckM1 not having updated its taxonomy to the current standard (e.g. "Pseudomonadota" instead of "Proteobacteria"). As a consequence, a strain heterogenity is no longer available in the bin results (though we've left an empty column there for backwards compatibility reasons)
-taxbinmode has been deprecated, as GTDB-Tk can provide better bin-level taxonomies
Added the --fastnr flag, which in turn will pass the --fast flag to DIAMOND when running classification against the nr database in Step 4 of the pipeline. This is significantly faster at the expense of some accuracy, but didn't seem to change the results significantly in our test.
We have simplified the way we calculate disparity for contig and bins, see details here
sqm2tables.py is now called at the end of SqueezeMeta runs
We're moving towards using conda packages rather than vendoring SqueezeMeta's dependencies, see details here

Minor changes / bugfixes

Contig names and bin names now start with the project name, to make it easy to distinguish contigs/bins coming from different SqueezeMeta runs
Added read group tags identifying the sample from which the reads come from to the BAM files produced in step 10
Removed the make_databases_alt.pl and configure_nodb_alt.pl scripts, as the standard make_databases.pl, download_databases.pl and configure_nodb.pl scripts now are able to switching to a mirror if our server is unavailable
Added the -g parameter which will control the value of the -g|--global-ranking parameter in DIAMOND when running it against the nr database
Use forking instead of threads in scripts 06 and 10 to reduce memory usage when multithreading
Fixed a bug that prevented sqm_hmm_reads.pl from working since it was trying to download legacy PFAM databases that are no longer reachable
Fixed a bug in which some ORFs would be duplicated if the pipeline went through step 13 on restart
Fixed the calculation of present pathways in step 20
Fixed a bug preventing SqueezeMeta to work with newer versions of MetaBAT2
Several bugfixes to SqueezeMeta's behaviour when restarting a run
We now use the scaffolds.fasta result instead of the contigs.fasta one when running SPAdes with the -a spades or -a spades-base (we still use the transcripts.fasta result if running it with the -a rnaspades mode
Fixed a bug in which sqm_annot.pl wasn't passing the right number of threads to subprocesses
Fixed a bug in step 10 when the total number of contigs was smaller than the available threads

SQMtools

New features

We have revamped all the documentation and moved it to Read The Docs! A PDF version will still be present as part of the CRAN release
SQMtools now supports loading more than one project into the same object. loadSQM can now be used to load the output of different SqueezeMeta runs into a single object that can be subsetted and plotted as a standard SQM object (see details here. This facilitates the analysis of e.g. sequential runs in which each sample was processed independently
We now provide basic functions for defining/modifying/curating bins within SQMtools, and the possibility of recalculating bin completeness/contamination after adding/removing contigs to the bin (either manually or through a subset function). See details here and here
Added exportContigs, exportORFs and exportBins to export the sequences present in a SQM or SQMbunch object
We changed the default way of calculating copy numbers from using RecA as a reference to using the median coverage of 10 Universal Single Copy Genes. This behaviour can be controlled via the single_copy_genes parameter in loadSQM

Minor changes / bugfixes

Added the load_sequences argument to loadSQM to control whether contig/ORF sequences should be loaded. Setting it to FALSE will decrease memory usage
Added an output_dir parameter to exportPathway
Start and end positions of ORFs are now tracked explicitly in SQM$orfs$table
copy_number is now the default quantification method used by plotFunctions and exportPathway, when available.
Fixed some IDs missing from SQM names and paths vectors after running combineSQMlite
Fixed a bug in which the data.table package wasn't attached when loading SQMtools
Fixed a bug when subsetting was attempted with only one ORF/contig

Assets 2

06 Jan 12:04

fpusan

v1.6.5post1

3a66a99

v1.6.5post1

Changed conda package recipe to depend only on the conda-forge and bioconda channels.

NOTE: This release is broken in GitHub, but its conda package is ok.

So if you are using conda, you don't have to worry. If you want to use the source code directly please use 1.6.5 instead, as the only changes here are related to conda packaging.

Assets 2

08 Aug 09:01

fpusan

v1.6.5

7362ead

v1.6.5

Fix a bug in which --cleaning would only use one pair of files per sample, even if more were specified in the samples file.
Fixed a bug in which the pipeline would stop with an error at step 10 if the number of mapped reads was too low.
This is a fast release aimed to fix a couple of bugs. We have not updated SQMtools or the PDF manual so they both reflect version 1.6.3.

Assets 2

07 Jul 07:03

fpusan

v1.6.4

370d662

v1.6.4

This changes the way that bin disparity is calculated. Now it will be simply the ratio of contigs disagreeing with the consensus taxonomy. This is faster and leads to comparable results overall. This also fix an issue in which very large bins (such as eukaryotic bins) may consume a lot of memory during step 16.
This is a fast release aimed to fix a single bug. We have not updated SQMtools or the PDF manual so they both reflect version 1.6.3.

Assets 2

20 Sep 07:54

fpusan

v1.6.3

445c45b

v1.6.3

Conda installations will now prioritize conda binaries instead of the vendored ones in some cases. This will hopefully fix certain issues in which SqueezeMeta was failing on certain distributions/versions.
test_install.pl now performs additional tests to check that binaries can be executed in the current environment.
Increased speed and reduced memory usage in step 10 (read counting).
Fixed an error in which projects created with the sequential mode would fail to restart. Note that each sample still has to be restarted individually.
Fixed an error in which step 16 (DAStool bin merging) would be attempted even if the --nobins flag was provided.
SQMtools: fixed an error in exportPathways when the requested KEGG map had only arrows.
SQMtools: fixed an error in which figures would not generated properly when `count='percent' was selected if any sample had 0 reads (as could happen when analyzing subsets).

Assets 2

12 Jul 16:13

fpusan

v1.6.2post3

2fbf69d

v1.6.2post3

Update SPAdes to 3.15.5 so it works with python 3.10

Assets 2

11 Jul 16:15

fpusan

v1.6.2post2

ba9a595

v1.6.2post2

Upgrade to python 3.10 and improve conda packaging, hopefully fix #705 and be more future-proof

Assets 2

03 May 18:26

fpusan

v1.6.2post1

8b9b4e3

v1.6.2post1

Fix an issue in which pysam was not properly installed when installing SqueezeMeta through conda

Assets 2

21 Mar 17:29

fpusan

v1.6.2

0647985

v1.6.2

New features

Added spades-base as a possible assembler for SqueezeMeta. This will make SqueezeMeta call SPAdes with no additional flags. Flags for SPAdes can then customized by the user by passing --assembly_options "EXTRA OPTIONS" when calling SqueezeMeta. More information can be found in the ReadMe and the PDF manual.
Added the utility script sqm2zip.py, which allows to pack the essential files from a SqueezeMeta project into a single zip file.
SQMtools: loadSQM can now load a project directly from a zip file created by sqm2zip.py (syntax would be `loadSQM("/path/to/my_project.zip").
SQMtools: SQMtools is now available in CRAN and can be installed with install.packages("SQMtools") in Windows, Mac and Linux computers.
These changes are meant to allow users to easily transfer their data from their clusters/workstations to their personal computers and explore their results there.
SQMtools: mostAbundant and mostVariable now accept the argument bycol = TRUE, which will make these functions operate on columns rather than rows.

Minor changes / bugfixes

We now use coverage variances in addition to average contig coverages when calling metabat2, which should improve the quality of the resulting bins.
Mapping results are now stored as BAM files instead of SAM files, which should reduce disk usage.

Known issues / Other announcements

The make_databases.pl script may spend a lot of time in the "Creating SQLite databases" step. We have included a patch to improve this, but still it happens inconsistently (taking a few hours in some systems, and several days in others). Having a lot (1-2 Tb) of free disk space may help. download_databases.pl should be considered as the preferred way of quickly getting reasonably-up-to-date databases.
We are discontinuing official support for CentOS7, as its default libraries are too outdated now. We plan on supporting SqueezeMeta in Debian, WSL2-Ubuntu and (hopefully) CentOS Upstream in the not so distant future.

Assets 2

Releases: jtamames/SqueezeMeta

v1.7.2

SqueezeMeta

Compatibility notes

New features

Minor changes / bugfixes

SQMtools

New features

Minor changes / bugfixes

Uh oh!

v1.7.0 Birds of a feather

SqueezeMeta

Compatibility note

New features

Minor changes / bugfixes

SQMtools

New features

Minor changes / bugfixes

Uh oh!

v1.6.5post1

Uh oh!

v1.6.5

Uh oh!

v1.6.4

Uh oh!

v1.6.3

Uh oh!

v1.6.2post3

Uh oh!

v1.6.2post2

Uh oh!

v1.6.2post1

Uh oh!

v1.6.2

New features

Minor changes / bugfixes

Known issues / Other announcements

Uh oh!