{{ render_tsv_table("docs/assets/tables/all_workflows.tsv", sort_by="Name", filters={"Name": "Metabuli"}, columns=["Workflow Type", "Applicable Kingdom", "Last Known Changes", "Command-line Compatibility","Workflow Level", "Dockstore"]) }}
The Metabuli workflow assesses the taxonomic profile of raw sequencing data (FASTQ files).
Metabuli is suitable for classifying short reads AND long reads by comparing them to reference genomes.Optionally it can enable the extraction of reads from a specific NCBI taxon ID of interest. Metabuli uses a novel k-mer structure, called a "metamer", which incorporates both the DNA sequence for high specificity and amino acid conservation for sensitive homology detection.
The Metabuli_PHB workflow additionally includes read trimming software, Fastp (Illumina) and Porechop (ONT), for adapter trimming (recommended) and basic read preprocessing.
!!! caption "Metabuli Workflow Diagram"

!!! info "Database selection" The Metabuli software is database-dependent and taxonomic assignments are highly sensitive to the database used. An appropriate database should contain the expected organism(s) (e.g. Escherichia coli) and other taxa that may be present in the reads (e.g. Citrobacter freundii, a common contaminant).
!!! warning "Adjusting computational resources"
Metabuli can operate within user-specified random-access memory (RAM) limits but the database needs to fit in the storage provided. As such, it's advised to adjust the disk_space optional input for the Metabuli task according to the size of database used. Please check the default value in the inputs table below to ensure that the value is adequate for your seletected database.
| Database name | Database Description | Suggested Applications | GCP URI (for usage in Terra) | Source | Database Size (GB) | Date of Last Update |
|---|---|---|---|---|---|---|
| viral | RefSeq viral + human (T2T-CHM13v2.0) | Viral metagenomics | gs://theiagen-public-resources-rp/reference_data/databases/metabuli/refseq_virus-v223.tar.gz |
https://metabuli.steineggerlab.workers.dev/ | 4.0 | 2024/04/01 |
| GTDB | Prokaryote (Complete Genome/Chromosome, CheckM completeness > 90, and contamination <5) + human (T2T-CHM13v2.0) | Prokaryote metagenomics | gs://theiagen-public-resources-rp/reference_data/databases/metabuli/gtdb.tar.gz |
https://metabuli.steineggerlab.workers.dev/ | 68.8 | 2024/04/01 |
??? dna "taxon input parameter"
Inputting a taxon (NCBI taxon ID/name) will enable read extraction within the workflow. The input taxon will be standardized via querying the NCBI taxonomy hierarchy in the ete4_identify task. Additionally, a parent taxonomic rank (e.g. "genus", "family", "order", etc.) can be set in ete4_identify to extract reads at a higher taxonomic level relative to the input taxon.
??? dna "illumina input parameter"
Setting illumina to "true" enables Illumina mode for single-end reads. Inputting a read2 implicitly sets illumina to "true".
!!! caption "" === "Metabuli" /// html | div[class="searchable-table"]
{{ render_tsv_table("docs/assets/tables/all_inputs.tsv", input_table=True, filters={"Workflow": "Metabuli"}, columns=["Terra Task Name", "Variable", "Type", "Description", "Default Value", "Terra Status"], sort_by=[("Terra Status", True), "Terra Task Name", "Variable"], indent=8) }}
///
{{ include_md("common_text/ete4_identify_task.md") }}
{{ include_md("common_text/fastp_task.md", condition="metabuli") }}
{{ include_md("common_text/porechop_task.md", condition="metabuli") }}
{{ include_md("common_text/metabuli_task.md", condition="metabuli") }}
!!! caption "" === "Metabuli" /// html | div[class="searchable-table"]
{{ render_tsv_table("docs/assets/tables/all_outputs.tsv", input_table=False, filters={"Workflow": "Metabuli"}, columns=["Variable", "Type", "Description"], sort_by=["Variable"], indent=8) }}
///
The most important outputs of the Metabuli workflows are the metabuli_report files. These will include a breakdown of the number of sequences assigned to a particular taxon, and the percentage of reads assigned. A complete description of the report format can be found here.
When assessing the taxonomic identity of a single isolate's sequence, it is normal that a few reads are assigned to very closely rated taxa due to the shared sequence identity between them. "Very closely related taxa" may be genetically similar species in the same genus, or taxa with which the dominant species have undergone horizontal gene transfer. Unrelated taxa or a high abundance of these closely related taxa is indicative of contamination or sequencing of non-target taxa. Interpretation of the results is dependent on the biological context.
??? toggle "Example Metabuli report"
Below is an example metabuli_report for a Human immunodeficiency virus 1 sample. Only the first 13 lines are included here since the rows near the bottom are <0.08% of the reads, which are likely human-derived contamination.
From this report, we can see that ~98.78% of the reads were assigned at the species level (`species` in the 4th column) to "_Human immunodeficiency virus 1_". ~1.15% of the reads were unclassified, and the remaining <0.08% of reads are annoated as _Homo sapiens_ (not depicted).
```
#clade_proportion clade_count taxon_count rank taxID name
1.1457 3045 3045 no rank 0 unclassified
98.8543 262722 1 no rank 1 root
98.7850 262538 0 superkingdom 10239 Viruses
98.7843 262536 0 clade 2559587 Riboviria
98.7843 262536 0 kingdom 2732397 Pararnavirae
98.7843 262536 0 phylum 2732409 Artverviricota
98.7843 262536 0 class 2732514 Revtraviricetes
98.7843 262536 0 order 2169561 Ortervirales
98.7843 262536 0 family 11632 Retroviridae
98.7843 262536 0 subfamily 327045 Orthoretrovirinae
98.7843 262536 0 genus 11646 Lentivirus
**98.7843 262536 262536 species 11676 Human immunodeficiency virus 1**
```
Krona produces an interactive report that allows hierarchical data, such as the one from Metabuli, to be explored with zooming, multi-layered pie charts. These pie charts are intuitive and highly responsive.
??? toggle "Example Krona report"
Below is an example of the `krona_html` for a bacterial sample. Taxonomic rank is organised from the centre of the pie chart to the edge, with each slice representing the relative abundance of a given taxa in the sample.

!!! techdetails "Metabuli Technical Details" | | Links | | --- | --- | | Software Source Code | Metabuli on GitHub | | Software Documentation | https://github.com/steineggerlab/Metabuli/blob/master/README.md | | Original Publication(s) | Metabuli: sensitive and specific metagenomic classification via joint analysis of amino acid and DNA |