Dear SqueezeMeta team,
I have been using SqueezeMeta for a while and it has been very helpful, thanks!
I was wondering if you have any plans to implement gene clustering (e.g., CD-HIT, MMseqs2) for creating a non-redundant gene catalog in the pipeline. This approach has been widely used (e.g., https://metagenome-atlas.readthedocs.io/en/latest/usage/output.html#gene-catalog, https://methods-in-microbiomics.readthedocs.io/en/latest/assembly/metagenomic_workflows.html#gene-catalogs) in the field to remove redundancy / aggregate information and fasten downstream analyses (i.e., annotation and data analyses)
Curious to hear your thoughts.