Skip to content

[Metabuli | Fastp] Init Metabuli standalone workflow#1006

Merged
MrTheronJ merged 104 commits intomainfrom
kzm-metabuli-dev
Mar 24, 2026
Merged

[Metabuli | Fastp] Init Metabuli standalone workflow#1006
MrTheronJ merged 104 commits intomainfrom
kzm-metabuli-dev

Conversation

@xonq
Copy link
Member

@xonq xonq commented Feb 7, 2026

This PR closes #

🗑️ This dev branch should be deleted after merging to main.

🧠 Summary

Create metabuli standalone workflow to replace kraken2_ont workflow for appropriate long read classification. Metabuli is also compatible with Illumina reads.

As part of this PR, adapter trimming was implemented into the Metabuli workflow, per the software authors' recommendation. Fastp underwent a major release update compared to the previous PHB version, so modifications were made to merge fastp_se and fastp_pe, and toggle adapter trimming akin to task_trimmomatic.

Fastplong (separate software from fastp) was added for ONT trimming

⚡ Impacted Workflows/Tasks

Workflows

+ metabuli_wf

Δ read_QC_trim_pe

Δ read_QC_trim_se

Δ theiaviral_illumina_pe

Δ theiaviral_ont

Δ theiaviral_panel

Tasks

+ fastplong

Δ fastp

Δ metabuli

This PR may lead to different results in pre-existing outputs: Yes/No

This PR uses an element that could cause duplicate runs to have different results: Yes/No

🛠️ Changes

  • Metabuli wf created
  • Fastp updated v0.2.3 -> v1.1.0
  • Fastp_SE and Fastp_PE merged
  • fastp_trim_adapters added to mirror trimmomatic
  • task_metabuli edited to accommodate paired-end reads
  • task_metabuli edited to optionally extract reads based on taxon id provision
  • GTDB metabuli DB added to theiagen-public-resources-rp (may be too big)
  • theiaviral_ont downstream calls from metabuli are nested in a check for successful metabuli status
  • fastplong initialized as currently supported ONT read trimming software, staging porechop for deprecation

⚙️ Algorithm

➡️ Inputs

read_QC_trim_se +7 -4
+ fastp.cpu
+ fastp.disk_size
+ fastp.docker
+ fastp.fastp_adapter_fasta
+ fastp.fastp_trim_adapters
+ fastp.memory
+ fastp.read2
- fastp_se.cpu
- fastp_se.disk_size
- fastp_se.docker
- fastp_se.memory
theiaviral_ont +2 -1
+ metabuli.read2
+ metabuli.taxdump_path
- metabuli.taxonomy_path
theiaviral_illumina_pe +2 -0
+ fastp.fastp_adapter_fasta
+ fastp.fastp_trim_adapters
theiaviral_panel +2 -0
+ fastp.fastp_adapter_fasta
+ fastp.fastp_trim_adapters
read_QC_trim_pe +2 -0
+ fastp.fastp_adapter_fasta
+ fastp.fastp_trim_adapters
metabuli_wf +46 -0
+ call_trim
+ ete4_identify.cpu
+ ete4_identify.disk_size
+ ete4_identify.docker
+ ete4_identify.memory
+ ete4_identify.rank
+ fastp.cpu
+ fastp.disk_size
+ fastp.docker
+ fastp.fastp_adapter_fasta
+ fastp.fastp_args
+ fastp.fastp_min_length
+ fastp.fastp_quality_trim_score
+ fastp.fastp_window_size
+ fastp.memory
+ fastplong.cpu
+ fastplong.cut_front
+ fastplong.cut_tail
+ fastplong.disk_size
+ fastplong.docker
+ fastplong.fastplong_adapter_fasta
+ fastplong.fastplong_args
+ fastplong.fastplong_end_adapter
+ fastplong.fastplong_min_length
+ fastplong.fastplong_quality_trim_score
+ fastplong.fastplong_start_adapter
+ fastplong.fastplong_trim_adapters
+ fastplong.fastplong_window_size
+ fastplong.memory
+ illumina
+ metabuli.cpu
+ metabuli.docker
+ metabuli.extract_unclassified
+ metabuli.min_percent_coverage
+ metabuli.min_score
+ metabuli.min_sp_score
+ metabuli.taxdump_path
+ metabuli_db
+ metabuli_disk_size
+ metabuli_mem
+ read1
+ read2
+ samplename
+ taxon
+ version_capture.docker
+ version_capture.timezone

⬅️ Outputs

theiaviral_ont +1 -0
+ metabuli_status
metabuli_wf +26 -0
+ ete4_docker
+ ete4_version
+ fastp_docker
+ fastp_html_report
+ fastp_json_report
+ fastp_read1_trimmed
+ fastp_read2_trimmed
+ fastp_version
+ fastplong_docker
+ fastplong_html_report
+ fastplong_json_report
+ fastplong_read1_trimmed
+ fastplong_version
+ metabuli_classified_read1
+ metabuli_classified_read2
+ metabuli_classified_report
+ metabuli_docker
+ metabuli_krona_report
+ metabuli_report
+ metabuli_status
+ metabuli_version
+ metabuli_wf_analysis_date
+ metabuli_wf_version
+ ncbi_read_extraction_rank
+ ncbi_taxon_id
+ ncbi_taxon_name

🧪 Testing

Suggested Scenarios for Reviewer to Test

🔬 Final Developer Checklist

  • The workflow/task has been tested and results, including file contents, are as anticipated
  • The CI/CD has been adjusted and tests are passing (Theiagen developers)
  • Code changes follow the style guide
  • Documentation and/or workflow diagrams have been updated if applicable and follow the documentation style guide
    • You have updated the "Last Known Changes" field for any affected workflows in the respective workflow documentation page and for the entry in the docs/assets/tables/all_workflows.tsv table to be the tag for the next upcoming release. If you do not know the tag, please put "vX.X.X"

🎯 Reviewer Checklist

  • All changed results have been confirmed
  • You have tested the PR appropriately (see the testing guide for more information)
  • All code adheres to the style guide
  • MD5 sums have been updated
  • The PR author has addressed all comments
  • The documentation has been updated and adheres to the documentation style guide

@xonq xonq changed the title [Metabuli] Init Metabuli standalone workflow [Metabuli | Fastp] Init Metabuli standalone workflow Feb 10, 2026
cimendes
cimendes previously approved these changes Mar 5, 2026
@xonq
Copy link
Member Author

xonq commented Mar 20, 2026

Additional testing following merge conflict resolution:

@xonq
Copy link
Member Author

xonq commented Mar 23, 2026

Thanks for the great feedback, @MrTheronJ - all requests have been resolved and tests have been relaunched for 5 samples of TheiaViral_ONT and Metabuli standalone:

@xonq
Copy link
Member Author

xonq commented Mar 23, 2026

hardened regex further here (error due to transient URL handshake error unrelated to this PR):

@MrTheronJ MrTheronJ merged commit 6a0f127 into main Mar 24, 2026
9 checks passed
@MrTheronJ MrTheronJ deleted the kzm-metabuli-dev branch March 24, 2026 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants