v1.4.0 - Not the destination
New features
- Added the
utils/sqm_annot.plutility script, which performs functional and taxonomic annotation for a set of query genes. - Added
rnaspadesas a valid option for the-aparameter when callingSqueezeMeta.pl. - Added the
-mapping_optionsparameter, which allows the user to provide a string with custom parameters to be passed to the read mapping software. - We have dropped support for the MySQL db interface, as it was complex to maintain and rarely/never used.
- Features (or reads mapping to features) that do not contain protein-coding sequences (and thus can not be classified taxonomically by our LCA method) are now removed from the "Unclassified" category and grouped into the "No CDS" category instead, in both
sqm2tables.pyandSQMtools. The presence of non-protein-coding genes was a minor issue with metagenomics data, but generated a lot of unexpected Unclassified reads in metatranscriptomes, which can be attributed to the high proportion of rRNAs in those datasets (see #279, and thanks to @seppedm for the heads up!). Hopefully this will help users to notice when this is happening to their data. We recommend ignoring those reads (they are not truly "Unclassified", rather they are not classified by SqueezeMeta's main pipeline. From now on the "Unclassified" category is meant to represent only features that were classifiable by our pipeline, but weren't classified due to lack of a sufficiently close homolog in the reference database.
Minor changes / bugfixes
- Updated DIAMOND to v2.0.8 and SPAdes to v3.15.2.
- Fixed a bug in which
01.remap.plgenerated temp files colliding with the output of step 10 (references and SAM). - The
04.rundiamond.plnow creates a file namedDB_BUILD_DATEin theintermediatedirectory, which contains information on thenrdatabase version used for taxonomic annotation. - Fixed nomenclature issues in the
09.summarycontigs3.plscript regarding "super" ranks and the usage of brackets in taxa names. - Fixed an error introduced in v1.3.1 in which
09.summarycontigs.plwould fail if SqueezeMeta was run in the--eukmode. - Fixed an error introduced in v1.3.1 in which
anvi-load-sqm.pywould not work. - Fixed the redistributed version of
samtoolsnot running in Ubuntu20. - Installation instructions now priorize
conda-forgeoverbioconda, to fix a bug with theXML::Parserperl library. test_install.plnow correctly checks for theXML::Parserperl library.download_databases.pl,make_databases.pl, andconfigure_databases.plwill now try to download their data from a list of available hosts (not many so far, though). This should reduce individual server load and allow people to download the databases even if poor old silvani is down.- SQMtools: added the option
nocdstoplotTaxonomyto control how "No CDS" reads (see new features) will be treated during plotting. Possible options are "treat_separately" (plot them into their own category; default), "treat_as_unclassified" (plot them together with the "Unclassified" category) and "ignore" (do not plot them). - SQMtools: fixed a bug in which
combineSQMwould fail if data was loaded using thedata.tableengine. - SQMtools: fix
loadSQMandsubsetfunctions breaking if PFAM annotations were not computed while running SqueezeMeta.
Compatibility Changes
loadSQM(from SQMtools) andsqm2tables.pyandanvi-load-sqm.pywill fail when trying to parse projects created with versions prior to v1.3.1. To make the project work in 1.3.1, please:- Run step 9 again (perl 09.summarycontigs3.pl /path/to/your/project).
- Remove the /path/to/your/project/results/tables, if present.
As always, please open an issue if something's not working for you.