Releases: jtamames/SqueezeMeta
Releases · jtamames/SqueezeMeta
v1.6.1post1
- Fix for yesterday's release, which did not include all the intended features.
v1.6.1
New features
- Added the
seqvec2fastafunction toSQMtools. It will print a named vector containing sequences (as the ones used to store contig and ORF sequences inSQM$contigs$seqsandSQM$orfs$seqsas a single fasta-formatted string. - The
make_databases.pl,download_databases.plandconfigure_nodb.plscripts now perform more error checking after each database creation step, and will calltest_install.plbefore finishing. This should help detect the instances in which database creation was unsuccessful e.g. due to a failed download.
Minor changes / bugfixes
- Fixed a bug in
remap.pl. - Fixed a bug introduced in v1.6.0 in which trimmomatic was not being called even when the
--cleaningflag was provided. - Fixed a bug in which single reads were causing problems during assembly.
- Fixed a bug in which
cover.plwas using the system's perl interpreter instead the one in the user environment. - Improved SQL queries in
make_databases.plto hopefully speed up database creation. - Fixed an issue in which mothur dependencies were not correctly fulfilled by conda.
- Fixed an issue in which restarting a sequential project failed at step 4.
- Fixed several minor issues with the restart mode.
- Fixed
remove_duplicate_markers.plso it works in the new binning structure. - Fixed an issue in which SPAdes was using only 400G of memory even if more was available in the system.
engine="data.tableandtax_mode="prokfilter"are now the default options inloadSQM.- Fixed an issue in which
subsetSamplescorrupted the binning information, making it impossible to further subset the resulting object. - The PDF SQMtools manual is back. Future availability will depend on whether I can keep getting R's clunky latex interface to produce PDF's in which the tables are rendered correctly.
Known issues
- The
make_databases.plmay spend a lot of time in the "Creating SQLite databases" step. We have included a patch to improve this, but still it happens inconsistently (taking a few hours in some systems, and several days in others). Having a lot (1-2 Tb) of free disk space may help.download_databases.plshould be considered as the preferred way of quickly getting reasonably-up-to-date databases.
v1.6.0 - One egg for many baskets
New features
- The script
restart.plhas been removed. Project restart is now achieved by callingSqueezeMeta.pl --restart -p <project_name>. The flags-step <STEP> --force-overwritecan be added to this call in order to restart the pipeline from a specific step. - Users can now control whether the source of bin taxonomy is the LCA algorithm from SqueezeMeta, or the taxonomic assignment performed by CheckM. This can be controlled with the flag
-taxbinmode. Options ares(SqueezeMeta only, default),c(CheckM),s+c(SqueezeMeta, missing ranks will be completed with CheckM taxonomy when possible) orc+s(CheckM, missing ranks will be completed with SqueezeMeta taxonomy when possible). - Users can now control the minimum percentage of genes from the same taxa needed in order to taxonomically annotate a contig. This can be done with the flag
-consensus. sqm_longreads.plwill now consider partial hits completely contained inside a long read as valid hits. Before, partial hits were only considered valid if they occurred at the beginning or end of the reads. This has a noticeable impact in the annotation percentages. The old behaviour can be reinstated with the flags-nor-nopartialhits.sqm2pavian.plnow works with results fromsqm_reads.plandsqm_longreads.pl.- Added the option
--filtertosqm_mapper.pl. When this flag is present, the script will filter a set of input sequences, returning only the ones that did not map to the reference. - SQMtools: SQM objects now track the length, abundance, mapped bases, coverage and coverage per million reads of bins. The corresponding matrices can be found under the
SQM$binslist. When runningsubsetContigs, these values will be updated taking in consideration only the contigs from each bin that were selected. - SQMtools: added the
subsetSamplesfunction to generate subsetted SQM objects containing only the requested samples. - SQMtoools: added the
plotBinsfunction to generate barcharts with the distribution of bins across samples. - SQMtools: unmapped reads for functions are no longer tracked, since it led to inconsistent results in some cases (see #442). This also affects the tables generated by
sqm2tables.py. - SQMtools: added the
mostVariablefunction, which will return the most variable rows (based on their coefficient of variation) from a data.frame or matrix. The interface is otherwise similar to themostAbundantfunction. - SQMtools: SQM objects now track the coverage per million of reads of orfs, contigs, bins and functions. Each can be accessed inside the corresponding list under the
cpmname."cpm"is also a validcountoption forplotFunctionsandplotBins.
Minor changes / bugfixes
- SQMtools will from now on follow the same version numbers as the corresponding SqueezeMeta releases.
- Updated DIAMOND version to 2.0.15.
- Fixed a bug when adding taxonomic assignments to bins, in which a lack of consensus in a high level prevented looking for consensus at deeper levels.
- Fixed a bug in which
data.tablemay makeDAStoolcrash if it was called with a very high number of threads. - Fixed a bug in which both reads of a pair were counted as mapped even if only one of them actually mapped to the reference. This had little impact in real datasets, but is corrected now.
- Fixed a bug in which custom arguments passed to bowtie2 with
-mapping_optionsconflicted in some cases with the--very-sensitive-localoption that we use by default when calling bowtie2.--very-sensitive-localis now skipped when the user provides custom arguments to bowtie2. - Fixed an uncommon issue in which contigs could end up being assigned to more than one bin after restarting the pipeline.
- Fixed a bug in
sqm_longreads.plwhen using several input files from the same sample. loadSQMnow removes redundant info from the orfs and contigs tables when loading a project intoSQMtoolsresulting in less memory usage.- Fixed a bug in which loading a project with
loadSQMcould randomly caused an error. - We no longer provide a PDF manual for SQMtools. The documentation for each function can still be accessed from the R terminal or RStudio.
Compatibility Changes
- Results generated by previous versions of SqueezeMeta will not load into SQMtools 1.6.0 (which corresponds to SqueezeMeta release 1.6.0). Running
19.getcontigs.pl /path/to/projectwill make a project generated with SqueezeMeta v1.5 compatible with the new version of SQMtools.
v1.5.2
Minor changes / bugfixes
- Fixed a bug in consensus taxonomy search during binning, in which a bin could get assigned to a low taxonomic rank even if there was no consensus at higher taxonomic ranks.
- Updated DIAMOND version to 2.0.14. This should get rid of several cases in which search against the nr database resulted in out of memory errors.
- Fixed a typo in the PDF manual in which Figure 6 was missing
v1.5.1
v1.5.0-post.2
Minor changes / bugfixes
- Added a timeout when checking for available download servers, to avoid locks in the database download scripts if a server is down.
v1.5.0-post.1
Minor changes / bugfixes
- Changed the installation instructions so that they recommend using
mambainstead ofcondafor installing SqueezeMeta. The ReadMe and the PDF manual also now indicate how to getmambaworking in your base conda environment.
v1.5.0 - Await another voice
New features
- Binning was refurnished. Binners can be now selected from command line using the
-binnersoption. Options--nomaxbinand--nometabatare now obsolete. Steps 14 (metabat) and 15 (maxbin) were dropped, and replaced by a single step doing all binning. This produced a change in numbering for subsequent scripts and results. The script versionchange.pl was introduced to provide compatibility of old results with this current one. - Added the utility script
sqm_mapper.pl, which maps reads to a given reference using one of the included sequence aligners (Bowtie2, BWA), and provides estimation of the abundance of the contigs and ORFs in the reference. - Reworked the utility script
sqm_annot.pl. This script performs functional and taxonomic annotation for a set of genes or genomes. Genomes must be nucleotide sequences, while gene sequences can be either nucleotides or amino acids. All sequence files must be in fasta format. - Added CONCOCT as an extra option for binning.
- Added the possibility of selecting only the functions/taxa of interest when using the
sqmreads2tables.pyto create summary tables fromsqm_reads.plandsqm_longreads.plprojects. This is achieved by passing an extra-q/--queryparameter tosqmreads2tables.py. Query syntax is similar to that ofanvi-filter-sqm.py. - Added the utility script
add_databases.pl, which will add one or several new databases to the results of an existing project. The script will run DIAMOND searches for the new databases, and then will re-run several SqueezeMeta scripts to include the new database(s) to the existing results. The following scripts will be invoked: 07, 12, 13 and 21.
Minor changes / bugfixes
- Added the
--norenameflag toSqueezeMeta.pl, to keep the original contig names produced by the assemblers, or already present in the external assembly provided with the-extassemblyparameter. Contig names containing underscores may break the pipeline, so use it with caution. - Added compatibility for anvi'o 7.1 in the
anvi-load-sqm.pyandanvi-filter-sqm.pyscripts. - Updated canu to version 2.2.
- Updated flye to version 2.9.
- Fixed a bug in which neither the last contig nor its length were included in calculations.
- Changed the automatic calculation of the
-bparameter in DIAMOND fromfree_ram/5tofree_ram/8to be more conservative with memory usage. - Fixed a bug in which
sqm2anvio.plwoud fail if the file names contained the substring "sam" (other than having ".sam" as the extension). - Added the
--very-sensitive-localparameter to bowtie2 calls to increase performance. - Fixed an issue in the
blastxcollapse.plscript than appeared when the number of sequences was smaller than the number of threads. - Allow to use minimap2 as a mapper in the
sqm_mapper.plscript. - Corrected an error in which the RPKM of the contigs was multiplied by 10^9 rather than 10^6.
- Fixed an issue in which the minpath step generated files in the wrong paths.
- SQMtools: Added the
metadata_groupsparameter toplotTaxonomy,plotFunctions,plotBarsandplotHeatmatto divide samples between different subplots.
Compatibility Changes
- Results generated by previous versions of SqueezeMeta will not load into SQMtools 0.7.0 (which corresponds to SqueezeMeta release 1.5). We provide the utility script
versionchange.plin order to make older projects compatible with the new versions. - Conversely, projects generated with SqueezeMeta v1.5 will not load into older versions of SQMtools.
As always, please open an issue if something's not working for you.
v1.4.0 - Not the destination
New features
- Added the
utils/sqm_annot.plutility script, which performs functional and taxonomic annotation for a set of query genes. - Added
rnaspadesas a valid option for the-aparameter when callingSqueezeMeta.pl. - Added the
-mapping_optionsparameter, which allows the user to provide a string with custom parameters to be passed to the read mapping software. - We have dropped support for the MySQL db interface, as it was complex to maintain and rarely/never used.
- Features (or reads mapping to features) that do not contain protein-coding sequences (and thus can not be classified taxonomically by our LCA method) are now removed from the "Unclassified" category and grouped into the "No CDS" category instead, in both
sqm2tables.pyandSQMtools. The presence of non-protein-coding genes was a minor issue with metagenomics data, but generated a lot of unexpected Unclassified reads in metatranscriptomes, which can be attributed to the high proportion of rRNAs in those datasets (see #279, and thanks to @seppedm for the heads up!). Hopefully this will help users to notice when this is happening to their data. We recommend ignoring those reads (they are not truly "Unclassified", rather they are not classified by SqueezeMeta's main pipeline. From now on the "Unclassified" category is meant to represent only features that were classifiable by our pipeline, but weren't classified due to lack of a sufficiently close homolog in the reference database.
Minor changes / bugfixes
- Updated DIAMOND to v2.0.8 and SPAdes to v3.15.2.
- Fixed a bug in which
01.remap.plgenerated temp files colliding with the output of step 10 (references and SAM). - The
04.rundiamond.plnow creates a file namedDB_BUILD_DATEin theintermediatedirectory, which contains information on thenrdatabase version used for taxonomic annotation. - Fixed nomenclature issues in the
09.summarycontigs3.plscript regarding "super" ranks and the usage of brackets in taxa names. - Fixed an error introduced in v1.3.1 in which
09.summarycontigs.plwould fail if SqueezeMeta was run in the--eukmode. - Fixed an error introduced in v1.3.1 in which
anvi-load-sqm.pywould not work. - Fixed the redistributed version of
samtoolsnot running in Ubuntu20. - Installation instructions now priorize
conda-forgeoverbioconda, to fix a bug with theXML::Parserperl library. test_install.plnow correctly checks for theXML::Parserperl library.download_databases.pl,make_databases.pl, andconfigure_databases.plwill now try to download their data from a list of available hosts (not many so far, though). This should reduce individual server load and allow people to download the databases even if poor old silvani is down.- SQMtools: added the option
nocdstoplotTaxonomyto control how "No CDS" reads (see new features) will be treated during plotting. Possible options are "treat_separately" (plot them into their own category; default), "treat_as_unclassified" (plot them together with the "Unclassified" category) and "ignore" (do not plot them). - SQMtools: fixed a bug in which
combineSQMwould fail if data was loaded using thedata.tableengine. - SQMtools: fix
loadSQMandsubsetfunctions breaking if PFAM annotations were not computed while running SqueezeMeta.
Compatibility Changes
loadSQM(from SQMtools) andsqm2tables.pyandanvi-load-sqm.pywill fail when trying to parse projects created with versions prior to v1.3.1. To make the project work in 1.3.1, please:- Run step 9 again (perl 09.summarycontigs3.pl /path/to/your/project).
- Remove the /path/to/your/project/results/tables, if present.
As always, please open an issue if something's not working for you.
v1.3.1
New features
- Added
sqm_mapper.pl, an utility script that maps reads to a given reference using Bowtie2 or BWA and provides estimation of the abundance of the contigs and ORFs in the reference. - Added the
make_databases_alt.pl, which works likemake_databases.plbut tries to download the data from a different mirror.
Minor changes / bugfixes
- Fixed a bug in which the
--cleaningwould use clean reads for assembly, but all the reads for mapping. - Added multithreading for parsing SAM results in Step 10.
- Removed RPKM columns from tables 19 and 20.
test_install.plnow also checks the integrity of thenr.dmndandtaxid.dbdatabases, making it easier to spot a corrupted database installation.- Anvi`o v7 is now oficially supported.
- SQMtools: fixed a bug in which subsetting only one ORF would fail.
- SQMtools: fixed a bug that appeared in v1.3, in which Unmapped reads were incorrectly calculated when subsetting SQM objects.
- SQMtools: fixed a bug in which ORF TPMs were not properly rescaled after using
rescale=Tin any of the subset functions. - SQMtools: we now use contig taxonomies for the
nofilterandprokfiltermodes, instead of ORF taxonomies. This gets rid of a bug that appeared in v1.3, in which reads mapping to contigs but not ORFs were incorrectly treated as Unmapped.
Compatibility Changes
loadSQM(from SQMtools) andsqm2tables.pywill fail when trying to parse projects created with versions prior to v1.3.1. To make the project work in 1.3.1, please:- Run step 9 again (
perl 09.summarycontigs3.pl /path/to/your/project). - Remove the
/path/to/your/project/results/tables, if present.
- Run step 9 again (
As always, please open an issue if something's not working for you.