Skip to content

Commit 7c73025

Browse files
authored
[Zip_Column_Content] Resolve overwriting of identical filenames (#1009)
* no overwritting in zip * simplification improvements * docs version * adding file translation tracking * adding header * output docs * wf docs
1 parent 86859de commit 7c73025

File tree

5 files changed

+37
-5
lines changed

5 files changed

+37
-5
lines changed

docs/assets/tables/all_outputs.tsv

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -379,6 +379,7 @@ fastqc_raw2_html File An HTML file that provides a graphical visualization of ra
379379
fastqc_version String Version of fastqc software used Freyja_FASTQ, TheiaCoV_Illumina_PE, TheiaCoV_Illumina_SE, TheiaEuk_Illumina_PE, TheiaMeta_Illumina_PE, TheiaProk_Illumina_PE, TheiaProk_Illumina_SE
380380
fetch_srr_accession_analysis_date String The date the fetch_srr_accession analysis was run. Fetch_SRR_Accession
381381
fetch_srr_accession_version String The version of the fetch_srr_accession workflow. Fetch_SRR_Accession
382+
file_translations File A tracking file to use for referencing original filenames and paths when identical files are indexed. Zip_Column_Content
382383
filtered_contigs_metrics File File containing metrics of contigs filtered TheiaEuk_Illumina_PE, TheiaEuk_ONT, TheiaProk_Illumina_PE, TheiaProk_Illumina_SE, TheiaProk_ONT
383384
flu_A_315675_resistance String resistance mutations to A_315675 TheiaCoV_FASTA, TheiaCoV_Illumina_PE, TheiaCoV_ONT
384385
flu_amantadine_resistance String resistance mutations to amantadine TheiaCoV_FASTA, TheiaCoV_Illumina_PE, TheiaCoV_ONT

docs/assets/tables/all_workflows.tsv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,4 +48,4 @@ Name Description Applicable Kingdom Workflow Level Workflow Type Command-line Co
4848
[**Transfer_Column_Content**](../workflows/data_export/transfer_column_content.md) Transfer contents of a specified Terra data table column for many samples ("entities") to a GCP storage bucket location [Any taxa](../../workflows_overview/workflows_kingdom.md#any-taxa) Set-level [Exporting Data from Terra](../../workflows_overview/workflows_type.md#exporting-data-from-terra) Yes v1.3.0 [Transfer_Column_Content_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Transfer_Column_Content_PHB:main?tab=info)
4949
[**Usher**](../workflows/phylogenetic_placement/usher.md) Use UShER to rapidly and accurately place your samples on any existing phylogenetic tree Monkeypox virus, SARS-CoV-2, [Viral](../../workflows_overview/workflows_kingdom.md#viral) Sample-level, Set-level [Phylogenetic Placement](../../workflows_overview/workflows_type.md#phylogenetic-placement) Yes v2.1.0 [Usher_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Usher_PHB:main?tab=info)
5050
[**VADR_Update**](../workflows/genomic_characterization/vadr_update.md) Update VADR assignments HAV, Influenza, Monkeypox virus, RSV-A, RSV-B, SARS-CoV-2, [Viral](../../workflows_overview/workflows_kingdom.md#viral), WNV Sample-level [Genomic Characterization](../../workflows_overview/workflows_type.md#genomic-characterization) Yes v4.0.0 [VADR_Update_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/VADR_Update_PHB:main?tab=info)
51-
[**Zip_Column_Content**](../workflows/data_export/zip_column_content.md) Zip contents of a specified Terra data table column for many samples ("entities") [Any taxa](../../workflows_overview/workflows_kingdom.md#any-taxa) Set-level [Exporting Data from Terra](../../workflows_overview/workflows_type.md#exporting-data-from-terra) Yes v2.1.0 [Zip_Column_Content_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Zip_Column_Content_PHB:main?tab=info)
51+
[**Zip_Column_Content**](../workflows/data_export/zip_column_content.md) Zip contents of a specified Terra data table column for many samples ("entities") [Any taxa](../../workflows_overview/workflows_kingdom.md#any-taxa) Set-level [Exporting Data from Terra](../../workflows_overview/workflows_type.md#exporting-data-from-terra) Yes vX.X.X [Zip_Column_Content_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Zip_Column_Content_PHB:main?tab=info)

docs/workflows/data_export/zip_column_content.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ title: Zip_Column_Content
1010

1111
This workflow will create a zip file containing all of the items from a given column in a Terra Data Table. This is useful when you want to share a collection of result files.
1212

13+
If a column contains files that do not have unique filenames then an index will be appended to identical filenames. Original file paths and names are preserved and referenced within the `file_translations.tsv` output.
14+
1315
### Inputs
1416

1517
This workflow runs on the _set_ level.

tasks/utilities/file_handling/task_zip_files.wdl

Lines changed: 32 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,17 +16,45 @@ task zip_files {
1616
command <<<
1717
file_array=(~{sep=' ' files_to_zip})
1818
mkdir ~{zipped_file_name}
19+
echo -e "origin_path\tnew_path" > file_translations.tsv
1920
20-
# move files oto a single directory before zipping
21-
for index in ${!file_array[@]}; do
22-
file=${file_array[$index]}
23-
mv ${file} ~{zipped_file_name}
21+
# move files into a single directory before zipping
22+
for file in "${file_array[@]}"; do
23+
24+
echo "DEBUG: Pulling $file"
25+
if [ -f "$file" ]; then
26+
echo "DEBUG: $file exists"
27+
filename=$(basename "$file") # Extract the filename (e.g., test.tsv)
28+
dest="~{zipped_file_name}/$filename"
29+
30+
# Counter is always set to 1 so that if there are
31+
# other duplicated filenames they will be counted as well.
32+
counter=1
33+
34+
echo "DEBUG: Checking for $file in $dest"
35+
# Check for duplicate files in the destination
36+
while [ -e "$dest" ]; do
37+
echo "DEBUG: Duplicate filename found, adding a file index for differentiation."
38+
dest="~{zipped_file_name}/${filename%.*}_${counter}.${filename##*.}"
39+
echo "DEBUG: New filename ${filename%.*}_${counter}.${filename##*.}"
40+
((counter++))
41+
done
42+
43+
# Move the file to the destination with the new name
44+
# If loop is not entered, filename will remain unchanged.
45+
mv "$file" "$dest"
46+
echo -e "$file\t$dest" >> file_translations.tsv
47+
48+
else
49+
echo "File not found: $file"
50+
fi
2451
done
2552
2653
zip -r ~{zipped_file_name}.zip ~{zipped_file_name}
2754
>>>
2855
output {
2956
File zipped_files = "~{zipped_file_name}.zip"
57+
File file_translations = "file_translations.tsv"
3058
}
3159
runtime {
3260
docker: "~{docker_image}"

workflows/utilities/file_handling/wf_zip_column.wdl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,5 +22,6 @@ workflow zip_column_content {
2222
String zip_column_content_analysis_date = version_capture.date
2323

2424
File zipped_files = zip_files.zipped_files
25+
File file_translations = zip_files.file_translations
2526
}
2627
}

0 commit comments

Comments
 (0)