Skip to content

MissingOutputException with directory and GS remote provider #576

@eric-czech

Description

@eric-czech

I am trying to run a workflow rule that creates a directory in GS, but Snakemake continually fails to recognize that the directory exists. The error message recommends using the directory() flag, which I am.

This appears to be related to #396.

Snakemake version

5.22.1

Describe the bug

Output directories are flagged as missing when using GS remote provider.

Logs

The most salient part:

Uploading to remote: rs-ukb/logs/bgen_to_zarr.XY.txt
Finished upload.
ImproperOutputException in line 57 of /workdir/Snakefile:
Outputs of incorrect type (directories when expecting files or vice versa). Output directories must be flagged with directory(). for rule bgen_to_zarr:
rs-ukb/prep-data/gt-imputation/ukb_chrXY.zarr
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 544, in handle_job_success
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 225, in handle_job_success

Full log: error_log.txt

Minimal example

Here is the offending rule, and I apologize that this isn't fully reproducible but it's difficult to share some of the details:

def bgen_samples_path(wc):
    n_samples = bgen_contigs.loc[wc.bgen_contig]['n_consent_samples']
    return [f"raw-data/gt-imputation/ukb59384_imp_chr{wc.bgen_contig}_v3_s{n_samples}.sample"]

rule bgen_to_zarr:
    input:
        bgen_path="raw-data/gt-imputation/ukb_imp_chr{bgen_contig}_v3.bgen",
        variants_path="raw-data/gt-imputation/ukb_mfi_chr{bgen_contig}_v3.txt",
        samples_path=bgen_samples_path
    output:
        directory("prep-data/gt-imputation/ukb_chr{bgen_contig}.zarr")
    params:
        contig_index=lambda wc: bgen_contigs.loc[str(wc.bgen_contig)]['index']
    conda:
        "envs/gwas.yaml"
    log:
        "logs/bgen_to_zarr.{bgen_contig}.txt"
    shell:
        # This will write to the local {output} path
        "python scripts/convert.py bgen_to_zarr "
        "--input-path-bgen={input.bgen_path} "
        "--input-path-variants={input.variants_path} "
        "--input-path-samples={input.samples_path} "
        "--output-path={output} "
        "--contig-name={wildcards.bgen_contig} "
        "--contig-index={params.contig_index} "
        "--remote=False 2> {log} "

Invocation:

snakemake --use-conda --cores 1 \
--default-remote-provider GS --default-remote-prefix $GS_BUCKET \
$GS_BUCKET/prep-data/gt-imputation/ukb_chrXY.zarr

I also get the same error when running on a cluster, i.e. using:

snakemake --use-conda --kubernetes \
--default-remote-provider GS --default-remote-prefix $GS_BUCKET \
$GS_BUCKET/prep-data/gt-imputation/ukb_chrXY.zarr

Additional context

I am able to work around this by using an individual checkpoint/sentinel file of some kind, but it's unclear to me if directories are even supported for Google Storage. Is that in the docs somewhere? Am I just trying to use some feature that doesn't exist?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions