Skip to content

v1.4.0 - Not the destination

Choose a tag to compare

@fpusan fpusan released this 20 May 15:04
· 490 commits to master since this release

New features

  • Added the utils/sqm_annot.pl utility script, which performs functional and taxonomic annotation for a set of query genes.
  • Added rnaspades as a valid option for the -a parameter when calling SqueezeMeta.pl.
  • Added the -mapping_options parameter, which allows the user to provide a string with custom parameters to be passed to the read mapping software.
  • We have dropped support for the MySQL db interface, as it was complex to maintain and rarely/never used.
  • Features (or reads mapping to features) that do not contain protein-coding sequences (and thus can not be classified taxonomically by our LCA method) are now removed from the "Unclassified" category and grouped into the "No CDS" category instead, in both sqm2tables.py and SQMtools. The presence of non-protein-coding genes was a minor issue with metagenomics data, but generated a lot of unexpected Unclassified reads in metatranscriptomes, which can be attributed to the high proportion of rRNAs in those datasets (see #279, and thanks to @seppedm for the heads up!). Hopefully this will help users to notice when this is happening to their data. We recommend ignoring those reads (they are not truly "Unclassified", rather they are not classified by SqueezeMeta's main pipeline. From now on the "Unclassified" category is meant to represent only features that were classifiable by our pipeline, but weren't classified due to lack of a sufficiently close homolog in the reference database.

Minor changes / bugfixes

  • Updated DIAMOND to v2.0.8 and SPAdes to v3.15.2.
  • Fixed a bug in which 01.remap.pl generated temp files colliding with the output of step 10 (references and SAM).
  • The 04.rundiamond.pl now creates a file named DB_BUILD_DATE in the intermediate directory, which contains information on the nr database version used for taxonomic annotation.
  • Fixed nomenclature issues in the 09.summarycontigs3.pl script regarding "super" ranks and the usage of brackets in taxa names.
  • Fixed an error introduced in v1.3.1 in which 09.summarycontigs.pl would fail if SqueezeMeta was run in the --euk mode.
  • Fixed an error introduced in v1.3.1 in which anvi-load-sqm.py would not work.
  • Fixed the redistributed version of samtools not running in Ubuntu20.
  • Installation instructions now priorize conda-forge over bioconda, to fix a bug with the XML::Parser perl library.
  • test_install.pl now correctly checks for the XML::Parser perl library.
  • download_databases.pl, make_databases.pl, and configure_databases.pl will now try to download their data from a list of available hosts (not many so far, though). This should reduce individual server load and allow people to download the databases even if poor old silvani is down.
  • SQMtools: added the option nocds to plotTaxonomy to control how "No CDS" reads (see new features) will be treated during plotting. Possible options are "treat_separately" (plot them into their own category; default), "treat_as_unclassified" (plot them together with the "Unclassified" category) and "ignore" (do not plot them).
  • SQMtools: fixed a bug in which combineSQM would fail if data was loaded using the data.table engine.
  • SQMtools: fix loadSQM and subset functions breaking if PFAM annotations were not computed while running SqueezeMeta.

Compatibility Changes

  • loadSQM (from SQMtools) and sqm2tables.py and anvi-load-sqm.py will fail when trying to parse projects created with versions prior to v1.3.1. To make the project work in 1.3.1, please:
    1. Run step 9 again (perl 09.summarycontigs3.pl /path/to/your/project).
    2. Remove the /path/to/your/project/results/tables, if present.

As always, please open an issue if something's not working for you.