I'm running into errors, specifically with ONT data where the make_examples, and post_process_variants seem to fail silently and run out the walltime on the slurm job.
/opt/deepvariant/bin/postprocess_variants \
\
--ref "Homo_sapiens_assembly38_masked_noALT.fasta" \
--infile "HG008-ONT.call.tfrecord.gz" \
--outfile "HG008-ONT.vcf.gz" \
--nonvariant_site_tfrecord_path "HG008-ONT.gvcf.tfrecord@00024.gz" \
--gvcf_outfile "HG008-ONT.g.vcf.gz" \
--sample_name HG008-ONT \
\
\
--cpus 12
leads to this log
2026-04-04 01:19:17.166772: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2026-04-04 01:19:18.142723: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2026-04-04 01:19:19.133450: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2026-04-04 01:19:19.137058: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2026-04-04 01:19:20.763585: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
I0404 01:19:29.745429 22949602054784 postprocess_variants.py:1480] Using sample name from call_variants output. Sample name: HG008-ONT
I0404 01:19:29.745574 22949602054784 postprocess_variants.py:1485] --sample_name is set but was not used.
I0404 01:19:30.980240 22949602054784 postprocess_variants.py:1735] Running postprocess_variants with parallelism using 12 CPUs over 12 partitions.
I0404 01:21:41.124814 22949602054784 postprocess_variants.py:1346] Processing region chr4:0-chr4:190214555
I0404 01:21:41.125960 22949602054784 postprocess_variants.py:1353] CVO sorting took 2.1677200396855674 minutes
I0404 01:21:41.126940 22949602054784 postprocess_variants.py:1357] Transforming call_variants_output to variants.
I0404 01:21:41.126992 22949602054784 postprocess_variants.py:1365] Processed 2979906 variants.
I0404 01:21:41.127068 22949602054784 postprocess_variants.py:1568] Processing variants (and writing to temporary files) took 2.1677387317021686 minutes
I0404 01:21:41.714890 22949602054784 postprocess_variants.py:1346] Processing region chr3:0-chr3:198295559
I0404 01:21:41.715251 22949602054784 postprocess_variants.py:1353] CVO sorting took 2.1775914231936135 minutes
I0404 01:21:41.716210 22949602054784 postprocess_variants.py:1357] Transforming call_variants_output to variants.
I0404 01:21:41.716259 22949602054784 postprocess_variants.py:1365] Processed 3088142 variants.
I0404 01:21:41.716333 22949602054784 postprocess_variants.py:1568] Processing variants (and writing to temporary files) took 2.1776097575823465 minutes
I0404 01:21:47.385473 22949602054784 postprocess_variants.py:1346] Processing region chr1:0-chr1:248956422
I0404 01:21:47.385832 22949602054784 postprocess_variants.py:1353] CVO sorting took 2.272206179300944 minutes
I0404 01:21:47.389170 22949602054784 postprocess_variants.py:1357] Transforming call_variants_output to variants.
I0404 01:21:47.389217 22949602054784 postprocess_variants.py:1365] Processed 3821722 variants.
I0404 01:21:47.389294 22949602054784 postprocess_variants.py:1568] Processing variants (and writing to temporary files) took 2.272264071305593 minutes
I0404 01:21:50.422237 22949602054784 postprocess_variants.py:1346] Processing region chr2:0-chr2:242193529
I0404 01:21:50.422600 22949602054784 postprocess_variants.py:1353] CVO sorting took 2.32276877562205 minutes
I0404 01:21:50.423541 22949602054784 postprocess_variants.py:1357] Transforming call_variants_output to variants.
I0404 01:21:50.423591 22949602054784 postprocess_variants.py:1365] Processed 3863892 variants.
I0404 01:21:50.423666 22949602054784 postprocess_variants.py:1568] Processing variants (and writing to temporary files) took 2.322786756356557 minutes
I0404 01:21:54.706866 22949602054784 postprocess_variants.py:1346] Processing region chr9:0-chr10:133797422
I0404 01:21:54.707232 22949602054784 postprocess_variants.py:1353] CVO sorting took 2.3939231952031452 minutes
I0404 01:21:54.708181 22949602054784 postprocess_variants.py:1357] Transforming call_variants_output to variants.
I0404 01:21:54.708232 22949602054784 postprocess_variants.py:1365] Processed 4243004 variants.
I0404 01:21:54.708307 22949602054784 postprocess_variants.py:1568] Processing variants (and writing to temporary files) took 2.3939413189888 minutes
I0404 01:21:56.652044 22949602054784 postprocess_variants.py:1346] Processing region chr13:0-chr15:101991189
I0404 01:21:56.652399 22949602054784 postprocess_variants.py:1353] CVO sorting took 2.4262409011522927 minutes
I0404 01:21:56.653347 22949602054784 postprocess_variants.py:1357] Transforming call_variants_output to variants.
I0404 01:21:56.653398 22949602054784 postprocess_variants.py:1365] Processed 4434845 variants.
I0404 01:21:56.653472 22949602054784 postprocess_variants.py:1568] Processing variants (and writing to temporary files) took 2.426258965333303 minutes
I0404 01:21:56.838805 22949602054784 postprocess_variants.py:1346] Processing region chr11:0-chr12:133275309
I0404 01:21:56.839159 22949602054784 postprocess_variants.py:1353] CVO sorting took 2.4294020970662435 minutes
I0404 01:21:56.840102 22949602054784 postprocess_variants.py:1357] Transforming call_variants_output to variants.
I0404 01:21:56.840152 22949602054784 postprocess_variants.py:1365] Processed 4351241 variants.
I0404 01:21:56.840232 22949602054784 postprocess_variants.py:1568] Processing variants (and writing to temporary files) took 2.4294201771418256 minutes
I0404 01:22:00.139049 22949602054784 postprocess_variants.py:1346] Processing region chr7:0-chr8:145138636
I0404 01:22:00.139398 22949602054784 postprocess_variants.py:1353] CVO sorting took 2.4845094402631123 minutes
I0404 01:22:00.140315 22949602054784 postprocess_variants.py:1357] Transforming call_variants_output to variants.
I0404 01:22:00.140364 22949602054784 postprocess_variants.py:1365] Processed 4870138 variants.
I0404 01:22:00.140439 22949602054784 postprocess_variants.py:1568] Processing variants (and writing to temporary files) took 2.484527015686035 minutes
I0404 01:22:02.539152 22949602054784 postprocess_variants.py:1346] Processing region chr20:0-chrX:156040895
I0404 01:22:02.539509 22949602054784 postprocess_variants.py:1353] CVO sorting took 2.5242565234502155 minutes
I0404 01:22:02.540465 22949602054784 postprocess_variants.py:1357] Transforming call_variants_output to variants.
I0404 01:22:02.540513 22949602054784 postprocess_variants.py:1365] Processed 4782710 variants.
I0404 01:22:02.540586 22949602054784 postprocess_variants.py:1568] Processing variants (and writing to temporary files) took 2.5242746671040854 minutes
I0404 01:22:03.098630 22949602054784 postprocess_variants.py:1346] Processing region chr16:0-chr19:58617616
I0404 01:22:03.098975 22949602054784 postprocess_variants.py:1353] CVO sorting took 2.533632683753967 minutes
I0404 01:22:03.099862 22949602054784 postprocess_variants.py:1357] Transforming call_variants_output to variants.
I0404 01:22:03.099909 22949602054784 postprocess_variants.py:1365] Processed 4986589 variants.
I0404 01:22:03.099981 22949602054784 postprocess_variants.py:1568] Processing variants (and writing to temporary files) took 2.5336498181025187 minutes
I0404 01:22:48.051768 22949602054784 postprocess_variants.py:1346] Processing region chrY:0-chrUn_JTFH01001998v1_decoy:2001
I0404 01:22:48.052109 22949602054784 postprocess_variants.py:1353] CVO sorting took 3.282629195849101 minutes
I0404 01:22:48.053007 22949602054784 postprocess_variants.py:1357] Transforming call_variants_output to variants.
I0404 01:22:48.053056 22949602054784 postprocess_variants.py:1365] Processed 31043 variants.
I0404 01:22:48.053124 22949602054784 postprocess_variants.py:1568] Processing variants (and writing to temporary files) took 3.282646294434865 minutes
I0404 02:01:37.957194 22949602054784 postprocess_variants.py:1603] VCF and gVCF creation took 39.93733240365982 minutes.
I0404 02:01:38.266677 22949602054784 postprocess_variants.py:1603] VCF and gVCF creation took 39.95232543547948 minutes.
I0404 02:04:05.298418 22949602054784 postprocess_variants.py:1603] VCF and gVCF creation took 42.29848404725393 minutes.
I0404 02:04:06.535706 22949602054784 postprocess_variants.py:1603] VCF and gVCF creation took 42.26853257020314 minutes.
I0404 02:06:28.027679 22949602054784 postprocess_variants.py:1603] VCF and gVCF creation took 44.55532149473826 minutes.
I0404 02:06:55.130117 22949602054784 postprocess_variants.py:1603] VCF and gVCF creation took 44.97149666547775 minutes.
I0404 02:07:17.104206 22949602054784 postprocess_variants.py:1603] VCF and gVCF creation took 45.3408441901207 minutes.
I0404 02:08:06.580466 22949602054784 postprocess_variants.py:1603] VCF and gVCF creation took 46.058006676038104 minutes.
I0404 02:08:19.705912 22949602054784 postprocess_variants.py:1603] VCF and gVCF creation took 46.28608732620875 minutes.
I0404 02:08:27.090184 22949602054784 postprocess_variants.py:1603] VCF and gVCF creation took 46.44916099309921 minutes.
I0404 02:32:27.794966 22949602054784 postprocess_variants.py:1603] VCF and gVCF creation took 69.66236266295115 minutes.
[2026-04-04T17:18:06.019] error: Detected 1 oom_kill event in StepId=66364513.batch. Some of the step tasks have been OOM Killed.
I'm running into errors, specifically with ONT data where the make_examples, and post_process_variants seem to fail silently and run out the walltime on the slurm job.
leads to this log