Skip to content

Releases: jtamames/SqueezeMeta

v1.6.1post1

07 Feb 12:34

Choose a tag to compare

  • Fix for yesterday's release, which did not include all the intended features.

v1.6.1

06 Feb 10:23
6643a92

Choose a tag to compare

New features

  • Added the seqvec2fasta function to SQMtools. It will print a named vector containing sequences (as the ones used to store contig and ORF sequences in SQM$contigs$seqs and SQM$orfs$seqs as a single fasta-formatted string.
  • The make_databases.pl, download_databases.pl and configure_nodb.pl scripts now perform more error checking after each database creation step, and will call test_install.pl before finishing. This should help detect the instances in which database creation was unsuccessful e.g. due to a failed download.

Minor changes / bugfixes

  • Fixed a bug in remap.pl.
  • Fixed a bug introduced in v1.6.0 in which trimmomatic was not being called even when the --cleaning flag was provided.
  • Fixed a bug in which single reads were causing problems during assembly.
  • Fixed a bug in which cover.pl was using the system's perl interpreter instead the one in the user environment.
  • Improved SQL queries in make_databases.pl to hopefully speed up database creation.
  • Fixed an issue in which mothur dependencies were not correctly fulfilled by conda.
  • Fixed an issue in which restarting a sequential project failed at step 4.
  • Fixed several minor issues with the restart mode.
  • Fixed remove_duplicate_markers.pl so it works in the new binning structure.
  • Fixed an issue in which SPAdes was using only 400G of memory even if more was available in the system.
  • engine="data.table and tax_mode="prokfilter" are now the default options in loadSQM.
  • Fixed an issue in which subsetSamples corrupted the binning information, making it impossible to further subset the resulting object.
  • The PDF SQMtools manual is back. Future availability will depend on whether I can keep getting R's clunky latex interface to produce PDF's in which the tables are rendered correctly.

Known issues

  • The make_databases.pl may spend a lot of time in the "Creating SQLite databases" step. We have included a patch to improve this, but still it happens inconsistently (taking a few hours in some systems, and several days in others). Having a lot (1-2 Tb) of free disk space may help. download_databases.pl should be considered as the preferred way of quickly getting reasonably-up-to-date databases.

v1.6.0 - One egg for many baskets

10 Sep 07:39
21ce1ff

Choose a tag to compare

New features

  • The script restart.pl has been removed. Project restart is now achieved by calling SqueezeMeta.pl --restart -p <project_name>. The flags -step <STEP> --force-overwrite can be added to this call in order to restart the pipeline from a specific step.
  • Users can now control whether the source of bin taxonomy is the LCA algorithm from SqueezeMeta, or the taxonomic assignment performed by CheckM. This can be controlled with the flag -taxbinmode. Options are s (SqueezeMeta only, default), c (CheckM), s+c (SqueezeMeta, missing ranks will be completed with CheckM taxonomy when possible) or c+s (CheckM, missing ranks will be completed with SqueezeMeta taxonomy when possible).
  • Users can now control the minimum percentage of genes from the same taxa needed in order to taxonomically annotate a contig. This can be done with the flag -consensus .
  • sqm_longreads.pl will now consider partial hits completely contained inside a long read as valid hits. Before, partial hits were only considered valid if they occurred at the beginning or end of the reads. This has a noticeable impact in the annotation percentages. The old behaviour can be reinstated with the flags -n or -nopartialhits.
  • sqm2pavian.pl now works with results from sqm_reads.pl and sqm_longreads.pl.
  • Added the option --filter to sqm_mapper.pl. When this flag is present, the script will filter a set of input sequences, returning only the ones that did not map to the reference.
  • SQMtools: SQM objects now track the length, abundance, mapped bases, coverage and coverage per million reads of bins. The corresponding matrices can be found under the SQM$bins list. When running subsetContigs, these values will be updated taking in consideration only the contigs from each bin that were selected.
  • SQMtools: added the subsetSamples function to generate subsetted SQM objects containing only the requested samples.
  • SQMtoools: added the plotBins function to generate barcharts with the distribution of bins across samples.
  • SQMtools: unmapped reads for functions are no longer tracked, since it led to inconsistent results in some cases (see #442). This also affects the tables generated by sqm2tables.py.
  • SQMtools: added the mostVariable function, which will return the most variable rows (based on their coefficient of variation) from a data.frame or matrix. The interface is otherwise similar to the mostAbundant function.
  • SQMtools: SQM objects now track the coverage per million of reads of orfs, contigs, bins and functions. Each can be accessed inside the corresponding list under the cpm name. "cpm" is also a valid count option for plotFunctions and plotBins.

Minor changes / bugfixes

  • SQMtools will from now on follow the same version numbers as the corresponding SqueezeMeta releases.
  • Updated DIAMOND version to 2.0.15.
  • Fixed a bug when adding taxonomic assignments to bins, in which a lack of consensus in a high level prevented looking for consensus at deeper levels.
  • Fixed a bug in which data.table may make DAStool crash if it was called with a very high number of threads.
  • Fixed a bug in which both reads of a pair were counted as mapped even if only one of them actually mapped to the reference. This had little impact in real datasets, but is corrected now.
  • Fixed a bug in which custom arguments passed to bowtie2 with -mapping_options conflicted in some cases with the --very-sensitive-local option that we use by default when calling bowtie2. --very-sensitive-local is now skipped when the user provides custom arguments to bowtie2.
  • Fixed an uncommon issue in which contigs could end up being assigned to more than one bin after restarting the pipeline.
  • Fixed a bug in sqm_longreads.pl when using several input files from the same sample.
  • loadSQM now removes redundant info from the orfs and contigs tables when loading a project into SQMtools resulting in less memory usage.
  • Fixed a bug in which loading a project with loadSQM could randomly caused an error.
  • We no longer provide a PDF manual for SQMtools. The documentation for each function can still be accessed from the R terminal or RStudio.

Compatibility Changes

  • Results generated by previous versions of SqueezeMeta will not load into SQMtools 1.6.0 (which corresponds to SqueezeMeta release 1.6.0). Running 19.getcontigs.pl /path/to/project will make a project generated with SqueezeMeta v1.5 compatible with the new version of SQMtools.

v1.5.2

12 Apr 11:14

Choose a tag to compare

Minor changes / bugfixes

  • Fixed a bug in consensus taxonomy search during binning, in which a bin could get assigned to a low taxonomic rank even if there was no consensus at higher taxonomic ranks.
  • Updated DIAMOND version to 2.0.14. This should get rid of several cases in which search against the nr database resulted in out of memory errors.
  • Fixed a typo in the PDF manual in which Figure 6 was missing

v1.5.1

20 Jan 17:40

Choose a tag to compare

Minor changes / bugfixes

  • Fixes #417, in which flye was missing some necessary binaries

v1.5.0-post.2

10 Jan 15:53

Choose a tag to compare

Minor changes / bugfixes

  • Added a timeout when checking for available download servers, to avoid locks in the database download scripts if a server is down.

v1.5.0-post.1

10 Jan 14:33

Choose a tag to compare

Minor changes / bugfixes

  • Changed the installation instructions so that they recommend using mamba instead of conda for installing SqueezeMeta. The ReadMe and the PDF manual also now indicate how to get mamba working in your base conda environment.

v1.5.0 - Await another voice

31 Dec 16:52

Choose a tag to compare

New features

  • Binning was refurnished. Binners can be now selected from command line using the -binners option. Options --nomaxbin and --nometabat are now obsolete. Steps 14 (metabat) and 15 (maxbin) were dropped, and replaced by a single step doing all binning. This produced a change in numbering for subsequent scripts and results. The script versionchange.pl was introduced to provide compatibility of old results with this current one.
  • Added the utility script sqm_mapper.pl, which maps reads to a given reference using one of the included sequence aligners (Bowtie2, BWA), and provides estimation of the abundance of the contigs and ORFs in the reference.
  • Reworked the utility script sqm_annot.pl. This script performs functional and taxonomic annotation for a set of genes or genomes. Genomes must be nucleotide sequences, while gene sequences can be either nucleotides or amino acids. All sequence files must be in fasta format.
  • Added CONCOCT as an extra option for binning.
  • Added the possibility of selecting only the functions/taxa of interest when using the sqmreads2tables.py to create summary tables from sqm_reads.pl and sqm_longreads.pl projects. This is achieved by passing an extra -q/--query parameter to sqmreads2tables.py. Query syntax is similar to that of anvi-filter-sqm.py.
  • Added the utility script add_databases.pl, which will add one or several new databases to the results of an existing project. The script will run DIAMOND searches for the new databases, and then will re-run several SqueezeMeta scripts to include the new database(s) to the existing results. The following scripts will be invoked: 07, 12, 13 and 21.

Minor changes / bugfixes

  • Added the --norename flag to SqueezeMeta.pl, to keep the original contig names produced by the assemblers, or already present in the external assembly provided with the -extassembly parameter. Contig names containing underscores may break the pipeline, so use it with caution.
  • Added compatibility for anvi'o 7.1 in the anvi-load-sqm.py and anvi-filter-sqm.py scripts.
  • Updated canu to version 2.2.
  • Updated flye to version 2.9.
  • Fixed a bug in which neither the last contig nor its length were included in calculations.
  • Changed the automatic calculation of the -b parameter in DIAMOND from free_ram/5 to free_ram/8 to be more conservative with memory usage.
  • Fixed a bug in which sqm2anvio.pl woud fail if the file names contained the substring "sam" (other than having ".sam" as the extension).
  • Added the --very-sensitive-local parameter to bowtie2 calls to increase performance.
  • Fixed an issue in the blastxcollapse.pl script than appeared when the number of sequences was smaller than the number of threads.
  • Allow to use minimap2 as a mapper in the sqm_mapper.pl script.
  • Corrected an error in which the RPKM of the contigs was multiplied by 10^9 rather than 10^6.
  • Fixed an issue in which the minpath step generated files in the wrong paths.
  • SQMtools: Added the metadata_groups parameter to plotTaxonomy, plotFunctions, plotBars and plotHeatmat to divide samples between different subplots.

Compatibility Changes

  • Results generated by previous versions of SqueezeMeta will not load into SQMtools 0.7.0 (which corresponds to SqueezeMeta release 1.5). We provide the utility script versionchange.pl in order to make older projects compatible with the new versions.
  • Conversely, projects generated with SqueezeMeta v1.5 will not load into older versions of SQMtools.

As always, please open an issue if something's not working for you.

v1.4.0 - Not the destination

20 May 15:04

Choose a tag to compare

New features

  • Added the utils/sqm_annot.pl utility script, which performs functional and taxonomic annotation for a set of query genes.
  • Added rnaspades as a valid option for the -a parameter when calling SqueezeMeta.pl.
  • Added the -mapping_options parameter, which allows the user to provide a string with custom parameters to be passed to the read mapping software.
  • We have dropped support for the MySQL db interface, as it was complex to maintain and rarely/never used.
  • Features (or reads mapping to features) that do not contain protein-coding sequences (and thus can not be classified taxonomically by our LCA method) are now removed from the "Unclassified" category and grouped into the "No CDS" category instead, in both sqm2tables.py and SQMtools. The presence of non-protein-coding genes was a minor issue with metagenomics data, but generated a lot of unexpected Unclassified reads in metatranscriptomes, which can be attributed to the high proportion of rRNAs in those datasets (see #279, and thanks to @seppedm for the heads up!). Hopefully this will help users to notice when this is happening to their data. We recommend ignoring those reads (they are not truly "Unclassified", rather they are not classified by SqueezeMeta's main pipeline. From now on the "Unclassified" category is meant to represent only features that were classifiable by our pipeline, but weren't classified due to lack of a sufficiently close homolog in the reference database.

Minor changes / bugfixes

  • Updated DIAMOND to v2.0.8 and SPAdes to v3.15.2.
  • Fixed a bug in which 01.remap.pl generated temp files colliding with the output of step 10 (references and SAM).
  • The 04.rundiamond.pl now creates a file named DB_BUILD_DATE in the intermediate directory, which contains information on the nr database version used for taxonomic annotation.
  • Fixed nomenclature issues in the 09.summarycontigs3.pl script regarding "super" ranks and the usage of brackets in taxa names.
  • Fixed an error introduced in v1.3.1 in which 09.summarycontigs.pl would fail if SqueezeMeta was run in the --euk mode.
  • Fixed an error introduced in v1.3.1 in which anvi-load-sqm.py would not work.
  • Fixed the redistributed version of samtools not running in Ubuntu20.
  • Installation instructions now priorize conda-forge over bioconda, to fix a bug with the XML::Parser perl library.
  • test_install.pl now correctly checks for the XML::Parser perl library.
  • download_databases.pl, make_databases.pl, and configure_databases.pl will now try to download their data from a list of available hosts (not many so far, though). This should reduce individual server load and allow people to download the databases even if poor old silvani is down.
  • SQMtools: added the option nocds to plotTaxonomy to control how "No CDS" reads (see new features) will be treated during plotting. Possible options are "treat_separately" (plot them into their own category; default), "treat_as_unclassified" (plot them together with the "Unclassified" category) and "ignore" (do not plot them).
  • SQMtools: fixed a bug in which combineSQM would fail if data was loaded using the data.table engine.
  • SQMtools: fix loadSQM and subset functions breaking if PFAM annotations were not computed while running SqueezeMeta.

Compatibility Changes

  • loadSQM (from SQMtools) and sqm2tables.py and anvi-load-sqm.py will fail when trying to parse projects created with versions prior to v1.3.1. To make the project work in 1.3.1, please:
    1. Run step 9 again (perl 09.summarycontigs3.pl /path/to/your/project).
    2. Remove the /path/to/your/project/results/tables, if present.

As always, please open an issue if something's not working for you.

v1.3.1

01 Feb 20:23

Choose a tag to compare

New features

  • Added sqm_mapper.pl, an utility script that maps reads to a given reference using Bowtie2 or BWA and provides estimation of the abundance of the contigs and ORFs in the reference.
  • Added the make_databases_alt.pl, which works like make_databases.pl but tries to download the data from a different mirror.

Minor changes / bugfixes

  • Fixed a bug in which the --cleaning would use clean reads for assembly, but all the reads for mapping.
  • Added multithreading for parsing SAM results in Step 10.
  • Removed RPKM columns from tables 19 and 20.
  • test_install.pl now also checks the integrity of the nr.dmnd and taxid.db databases, making it easier to spot a corrupted database installation.
  • Anvi`o v7 is now oficially supported.
  • SQMtools: fixed a bug in which subsetting only one ORF would fail.
  • SQMtools: fixed a bug that appeared in v1.3, in which Unmapped reads were incorrectly calculated when subsetting SQM objects.
  • SQMtools: fixed a bug in which ORF TPMs were not properly rescaled after using rescale=T in any of the subset functions.
  • SQMtools: we now use contig taxonomies for the nofilter and prokfilter modes, instead of ORF taxonomies. This gets rid of a bug that appeared in v1.3, in which reads mapping to contigs but not ORFs were incorrectly treated as Unmapped.

Compatibility Changes

  • loadSQM (from SQMtools) and sqm2tables.py will fail when trying to parse projects created with versions prior to v1.3.1. To make the project work in 1.3.1, please:
    1. Run step 9 again (perl 09.summarycontigs3.pl /path/to/your/project).
    2. Remove the /path/to/your/project/results/tables, if present.

As always, please open an issue if something's not working for you.