Skip to content

feat: add --ko_input option to skip DIAMOND search using pre-computed KO annotations#155

Open
ynishimuraLv wants to merge 3 commits into
chklovski:mainfrom
ynishimuraLv:feature/ko-input
Open

feat: add --ko_input option to skip DIAMOND search using pre-computed KO annotations#155
ynishimuraLv wants to merge 3 commits into
chklovski:mainfrom
ynishimuraLv:feature/ko-input

Conversation

@ynishimuraLv
Copy link
Copy Markdown

Motivation

DIAMOND-based KO annotation may fail for highly divergent genomes with very fast evolutionary rates (i.e., Candidatus Sukunaarchaeum mirabile), where sequence similarity to reference databases is too low to obtain reliable hits.
This option allows users to provide KO annotations generated by other tools and still leverage CheckM2's quality prediction workflow.

🤖 Developed with assistance from Claude Code

Summary

Adds a new --ko_input option to checkm2 predict that allows users to skip the DIAMOND blastp step by providing
pre-computed KO annotations.

Changes

  • --ko_input : Path to a folder containing per-genome KO annotation files (CSV format with gene_id and
    ko columns)
  • --input: When --ko_input is specified, --input should point to a folder of protein (amino acid) FASTA files
    instead of genome sequences (Prodigal is skipped)
  • Default file extension is set to .faa when --ko_input is used

Annotation file format

Each file in the --ko_input folder should be named {genome_name}.{any_extension} and be a CSV with the
following format:

gene_id,ko
protein_001,K00844
protein_002,K12407
protein_003,
protein_004,K00001

Validation

  • All gene_id values in the annotation file must be present in the corresponding protein FASTA file; genomes
    that fail this check are skipped with an error log
  • Genes absent from the annotation file (unannotated) are permitted — a warning is logged and prediction
    proceeds
  • --ko_input and --resume cannot be used together

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant