You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enabling single-end alignment in ww-star, ww-salmon, and related pipelines (#290)
* Allowing for single-ended alignment in ww-star and ww-sra-star
* Adjusting conditional logic in ww-sra-star
* Enabling single-end alignment functionality to ww-salmon, ww-sra-salmon, and ww-ena-star
* Adding single-end sequencing sample to ww-sra-star and ww-sra-salmon test runs
* Updating ww-sra-salmon test run import URL
* Updating ww-ena-star test run import URL
* Fixing single-end alignment functionality
* Adding acknowledgment in README's for Fred Hutch contributors
Co-Authored-By: Alice Berger <ahberger@fredhutch.org>
Co-Authored-By: Janet Young <jayoung@fredhutch.org>
* Updating ww-ena README based on EB suggestion
* Adjusting acknowledgment sections for ww-sra-star and ww-sra salmon
Co-Authored-By: Alice Berger <43681151+ahberger@users.noreply.github.com>
---------
Co-authored-by: Janet Young <jayoung@fredhutch.org>
Co-authored-by: Alice Berger <43681151+ahberger@users.noreply.github.com>
Copy file name to clipboardExpand all lines: modules/ww-ena/README.md
+6-5Lines changed: 6 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -66,17 +66,18 @@ Downloads sequencing data files from ENA using a search query. Allows filtering
66
66
67
67
### `extract_fastq_pairs`
68
68
69
-
Extracts R1 and R2 FASTQ files from downloaded ENA files for downstream paired-end processing. This task identifies all paired-end FASTQ files by common naming patterns, creates standardized outputs, and automatically extracts the accession ID from each filename. Supports multiple accessions in a single download.
69
+
Extracts FASTQ files from downloaded ENA files for downstream processing. Supports both paired-end and single-end data. This task identifies FASTQ files by common naming patterns, creates standardized outputs, automatically extracts the accession ID from each filename, and detects whether each sample is paired-end or single-end. Supports multiple accessions in a single download.
70
70
71
71
**Inputs:**
72
72
-`downloaded_files` (Array[File]): Array of files downloaded from ENA (typically from `download_files` task)
73
73
74
74
**Outputs:**
75
-
-`r1_files` (Array[File]): Array of Read 1 FASTQ files, parallel with `r2_files` and `accessions`
76
-
-`r2_files` (Array[File]): Array of Read 2 FASTQ files, parallel with `r1_files` and `accessions`
77
-
-`accessions` (Array[String]): Array of ENA accession IDs extracted from filenames, parallel with `r1_files` and `r2_files`
75
+
-`r1_files` (Array[File]): Array of Read 1 FASTQ files, parallel with `r2_files`, `accessions`, and `is_paired_end_list`
76
+
-`r2_files` (Array[File]): Array of Read 2 FASTQ files (empty placeholder for single-end samples), parallel with `r1_files`, `accessions`, and `is_paired_end_list`
77
+
-`accessions` (Array[String]): Array of ENA accession IDs extracted from filenames, parallel with `r1_files`, `r2_files`, and `is_paired_end_list`
78
+
-`is_paired_end_list` (Array[String]): Array of strings ("true"/"false") indicating whether each sample is paired-end, parallel with `r1_files`, `r2_files`, and `accessions`
78
79
79
-
**Usage Note:** This task is designed for FASTQ workflows requiring separate R1/R2 files. It searches for common paired-end naming patterns including `_1.fastq.gz`/`_2.fastq.gz`, `_R1.fastq.gz`/`_R2.fastq.gz`, and their uncompressed equivalents. The accession ID is automatically extracted from each filename (e.g., `ERR000001_1.fastq.gz` → `ERR000001`). The output arrays are parallel, meaning `r1_files[i]`, `r2_files[i]`, and `accessions[i]` all correspond to the same sample. If you're downloading other file formats (BAM, analysis files), you don't need this task.
80
+
**Usage Note:** This task is designed for FASTQ workflows requiring separate R1/R2 files. It searches for common naming patterns including `_1.fastq.gz`/`_2.fastq.gz`, `_R1.fastq.gz`/`_R2.fastq.gz`, and their uncompressed equivalents. The accession ID is automatically extracted from each filename (e.g., `ERR000001_1.fastq.gz` → `ERR000001`). The output arrays are parallel, meaning `r1_files[i]`, `r2_files[i]`, `accessions[i]`, and `is_paired_end_list[i]` all correspond to the same sample. If you're downloading other file formats (BAM, analysis files), you don't need this task.
# import "https://raw.githubusercontent.com/getwilds/wilds-wdl-library/refs/heads/fix-sra-star-jyoung/modules/ww-salmon/ww-salmon.wdl" as ww_salmon
5
+
# import "https://raw.githubusercontent.com/getwilds/wilds-wdl-library/refs/heads/main/modules/ww-testdata/ww-testdata.wdl" as ww_testdata
0 commit comments