improve storage usage, minimize duplication of FASTQ data

Just had a run of SPEAQeasy (locally) on a 30-sample dataset where the compressed raw data (fastq.gz) are about 175 GB total. 
Running that on a fast SSD with about 1.8 TB available storage, the SSD got filled quickly and the pipeline aborted running out of space on that storage. This seems unreasonable. 

It seems the main space hog is using uncompressed FASTQ files internally, in the working directories. This should and could be avoided, as most (all?) programs in the pipeline can use fastq.gz as input, or alternatively, the decompression of FASTQ can be performed on the fly if needed. 


 




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve storage usage, minimize duplication of FASTQ data #109

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

improve storage usage, minimize duplication of FASTQ data #109

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions