-
Notifications
You must be signed in to change notification settings - Fork 0
Add Snakemake workflow for QA pipelines #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Remove unused import (os) - Fix Python interpreter detection: prefer active interpreter, fallback to .venv - Fix configfile path using workflow.source_path() for portability - Add missing log directory creation in tsnr_plots, isc_plots rules - Add localrules declaration for clean targets (run locally, not on cluster) - Convert all Path objects to strings for shell commands - Remove unused 'force' config option - Remove unnecessary README.md
- Loading branch information
There are no files selected for viewing
This file was deleted.
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -6,47 +6,55 @@ This workflow runs the QA pipeline scripts to generate: | |||||||
| - ISC plots (requires ISC computation first) | ||||||||
|
|
||||||||
| Usage: | ||||||||
| # Run all QA pipelines | ||||||||
| snakemake --cores 1 | ||||||||
| # Run all QA pipelines (from project root) | ||||||||
| snakemake --snakefile workflow/Snakefile --cores 1 | ||||||||
|
|
||||||||
| # Run specific pipeline | ||||||||
| snakemake motion_plots --cores 1 | ||||||||
| snakemake tsnr_plots --cores 1 | ||||||||
| snakemake isc_plots --cores 1 | ||||||||
| snakemake --snakefile workflow/Snakefile motion_plots --cores 1 | ||||||||
| snakemake --snakefile workflow/Snakefile tsnr_plots --cores 1 | ||||||||
| snakemake --snakefile workflow/Snakefile isc_plots --cores 1 | ||||||||
|
|
||||||||
| # Dry run to see what would be executed | ||||||||
| snakemake -n | ||||||||
| snakemake --snakefile workflow/Snakefile -n | ||||||||
|
|
||||||||
| # Process specific subjects (motion/tsnr only) | ||||||||
| snakemake motion_plots --cores 1 --config subjects="sub-001 sub-002" | ||||||||
| snakemake --snakefile workflow/Snakefile motion_plots --cores 1 --config subjects="sub-001 sub-002" | ||||||||
|
|
||||||||
| # Force rerun of specific target | ||||||||
| snakemake tsnr_plots --cores 1 --forcerun | ||||||||
| snakemake --snakefile workflow/Snakefile tsnr_plots --cores 1 --forcerun | ||||||||
| """ | ||||||||
|
|
||||||||
| import os | ||||||||
| import shutil | ||||||||
| import sys | ||||||||
| from pathlib import Path | ||||||||
|
|
||||||||
| # Configuration | ||||||||
| configfile: "workflow/config/config.yaml" | ||||||||
|
|
||||||||
| # Get project root (parent of workflow directory) | ||||||||
| PROJECT_ROOT = Path(workflow.basedir).parent | ||||||||
|
|
||||||||
| # Python interpreter from virtual environment | ||||||||
| PYTHON = PROJECT_ROOT / ".venv" / "bin" / "python" | ||||||||
| # Configuration - use path relative to workflow directory | ||||||||
| configfile: workflow.source_path("config/config.yaml") | ||||||||
|
|
||||||||
| # Python interpreter - prefer active interpreter, fallback to venv | ||||||||
| VENV_PYTHON = PROJECT_ROOT / ".venv" / "bin" / "python" | ||||||||
| if Path(sys.executable).is_file(): | ||||||||
| PYTHON = str(sys.executable) | ||||||||
| elif VENV_PYTHON.is_file(): | ||||||||
| PYTHON = str(VENV_PYTHON) | ||||||||
| else: | ||||||||
| # Fallback to system python | ||||||||
| PYTHON = shutil.which("python3") or shutil.which("python") or "python" | ||||||||
|
|
||||||||
| # Scripts directory | ||||||||
| SCRIPTS_DIR = PROJECT_ROOT / "scripts" / "qa" | ||||||||
| SCRIPTS_DIR = str(PROJECT_ROOT / "scripts" / "qa") | ||||||||
|
|
||||||||
| # Data directories (from qa_config.yaml defaults) | ||||||||
| DATA_DIR = PROJECT_ROOT / "data" | ||||||||
| DERIVATIVES_DIR = DATA_DIR / "derivatives" | ||||||||
| FMRIPREP_DIR = DERIVATIVES_DIR / "fmriprep" | ||||||||
| QA_DIR = DERIVATIVES_DIR / "qa" | ||||||||
| TSNR_DIR = QA_DIR / "tsnr" | ||||||||
| MOTION_DIR = QA_DIR / "motion" | ||||||||
| ISC_DIR = QA_DIR / "isc" | ||||||||
| DATA_DIR = str(PROJECT_ROOT / "data") | ||||||||
| DERIVATIVES_DIR = f"{DATA_DIR}/derivatives" | ||||||||
| FMRIPREP_DIR = f"{DERIVATIVES_DIR}/fmriprep" | ||||||||
| QA_DIR = f"{DERIVATIVES_DIR}/qa" | ||||||||
| TSNR_DIR = f"{QA_DIR}/tsnr" | ||||||||
| MOTION_DIR = f"{QA_DIR}/motion" | ||||||||
| ISC_DIR = f"{QA_DIR}/isc" | ||||||||
|
Comment on lines
+50
to
+57
|
||||||||
|
|
||||||||
|
|
||||||||
| def get_subjects_arg(): | ||||||||
|
|
@@ -57,12 +65,16 @@ def get_subjects_arg(): | |||||||
| return "" | ||||||||
|
|
||||||||
|
|
||||||||
| # Declare rules that don't create output files (run locally, not on cluster) | ||||||||
| localrules: all, clean_motion, clean_tsnr, clean_isc, clean_all | ||||||||
|
|
||||||||
|
|
||||||||
| # Default target: run all QA pipelines | ||||||||
| rule all: | ||||||||
| input: | ||||||||
| rules.motion_plots.output if config.get("run_motion", True) else [], | ||||||||
| rules.tsnr_plots.output if config.get("run_tsnr", True) else [], | ||||||||
| rules.isc_plots.output if config.get("run_isc", True) else [], | ||||||||
| f"{MOTION_DIR}/.motion_plots.done" if config.get("run_motion", True) else [], | ||||||||
| f"{TSNR_DIR}/.tsnr_plots.done" if config.get("run_tsnr", True) else [], | ||||||||
| f"{ISC_DIR}/.isc_plots.done" if config.get("run_isc", True) else [], | ||||||||
|
|
||||||||
|
|
||||||||
| # ============================================================================= | ||||||||
|
|
@@ -73,13 +85,11 @@ rule all: | |||||||
| rule motion_plots: | ||||||||
| """Generate motion QA plots from fMRIprep confounds.""" | ||||||||
| input: | ||||||||
| script=SCRIPTS_DIR / "qa-plot-motion.py", | ||||||||
| # Sentinel file to check fMRIprep exists | ||||||||
| fmriprep=ancient(FMRIPREP_DIR), | ||||||||
| script=f"{SCRIPTS_DIR}/qa-plot-motion.py", | ||||||||
| output: | ||||||||
| done=touch(MOTION_DIR / ".motion_plots.done"), | ||||||||
| done=touch(f"{MOTION_DIR}/.motion_plots.done"), | ||||||||
| log: | ||||||||
| MOTION_DIR / "logs" / "motion_plots.log", | ||||||||
| f"{MOTION_DIR}/logs/motion_plots.log", | ||||||||
| params: | ||||||||
| subjects=get_subjects_arg(), | ||||||||
| shell: | ||||||||
|
|
@@ -97,12 +107,11 @@ rule motion_plots: | |||||||
| rule compute_tsnr: | ||||||||
| """Compute tSNR volumes from fMRIprep BOLD data.""" | ||||||||
| input: | ||||||||
| script=SCRIPTS_DIR / "qa-save-tsnr-volume.py", | ||||||||
| fmriprep=ancient(FMRIPREP_DIR), | ||||||||
| script=f"{SCRIPTS_DIR}/qa-save-tsnr-volume.py", | ||||||||
| output: | ||||||||
| done=touch(TSNR_DIR / ".tsnr_computed.done"), | ||||||||
| done=touch(f"{TSNR_DIR}/.tsnr_computed.done"), | ||||||||
| log: | ||||||||
| TSNR_DIR / "logs" / "compute_tsnr.log", | ||||||||
| f"{TSNR_DIR}/logs/compute_tsnr.log", | ||||||||
| params: | ||||||||
| subjects=get_subjects_arg(), | ||||||||
| shell: | ||||||||
|
|
@@ -115,16 +124,17 @@ rule compute_tsnr: | |||||||
| rule tsnr_plots: | ||||||||
| """Generate tSNR QA plots from pre-computed tSNR volumes.""" | ||||||||
| input: | ||||||||
| script=SCRIPTS_DIR / "qa-plot-tsnr.py", | ||||||||
| tsnr_computed=TSNR_DIR / ".tsnr_computed.done", | ||||||||
| script=f"{SCRIPTS_DIR}/qa-plot-tsnr.py", | ||||||||
| tsnr_computed=f"{TSNR_DIR}/.tsnr_computed.done", | ||||||||
| output: | ||||||||
| done=touch(TSNR_DIR / ".tsnr_plots.done"), | ||||||||
| done=touch(f"{TSNR_DIR}/.tsnr_plots.done"), | ||||||||
| log: | ||||||||
| TSNR_DIR / "logs" / "tsnr_plots.log", | ||||||||
| f"{TSNR_DIR}/logs/tsnr_plots.log", | ||||||||
| params: | ||||||||
| subjects=get_subjects_arg(), | ||||||||
| shell: | ||||||||
| """ | ||||||||
|
||||||||
| """ | |
| """ | |
| mkdir -p {TSNR_DIR}/logs |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, params.subjects is again interpolated unescaped into a shell command, which allows shell metacharacters in the subjects config to terminate the Python invocation and run arbitrary additional commands. Because this rule may be invoked with user-specified --config subjects=..., a malicious value can lead to code execution as the Snakemake user. This should be fixed by avoiding raw string concatenation of subjects into the shell command and instead passing it as a properly quoted argument or through a safer execution mechanism.
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The isc_plots rule writes its log to {ISC_DIR}/logs/... but does not create that directory. If the logs directory is missing, tee will fail. Create {ISC_DIR}/logs in the shell command (and/or ensure it exists via a dedicated directory rule).
| """ | |
| """ | |
| mkdir -p {ISC_DIR}/logs |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The motion_report rule writes its log to {MOTION_DIR}/logs/... but does not ensure that directory exists. Add a mkdir -p for the logs directory in this rule to avoid failures when running the rule in isolation (e.g., after cleaning logs but keeping .motion_plots.done).
| """ | |
| """ | |
| mkdir -p {MOTION_DIR}/logs |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The params.subjects value is used directly in this shell command for the motion HTML reports, so a crafted subjects config string containing shell metacharacters can inject arbitrary commands. Since Snakemake does not automatically escape this interpolation, any component that passes attacker-controlled subjects (for example via --config) could be abused for code execution. To address this, ensure subjects is passed as a data argument rather than concatenated into the shell command string, using proper quoting or non-shell execution.
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this tSNR HTML report rule, params.subjects is interpolated directly into the shell command, enabling shell injection if subjects contains characters like ;, &&, or backticks. An attacker who can influence subjects via configuration or CLI could execute arbitrary commands with the Snakemake user's privileges. This should be hardened by treating subjects purely as an argument (properly quoted or passed via a safer interface) instead of splicing it into the shell command string.
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tsnr_report rule writes its log to {TSNR_DIR}/logs/... but does not create that directory. Add a mkdir -p for {TSNR_DIR}/logs to prevent failures when the directory is missing.
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tsnr_summary rule logs to {TSNR_DIR}/logs/... but doesn’t ensure the logs directory exists. If the directory is absent, tee will fail and the rule won’t run. Add mkdir -p {TSNR_DIR}/logs (and ensure {TSNR_DIR} exists) before invoking the script.
| """ | |
| """ | |
| mkdir -p {TSNR_DIR}/logs |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tSNR summary rule also interpolates untrusted params.subjects into a shell command, which can be exploited for shell injection if subjects is supplied with embedded shell syntax. Because Snakemake formats this string into the shell without additional escaping, a malicious subjects value can break out of the intended argument and run arbitrary commands. To mitigate, avoid embedding subjects directly in the shell string and instead pass it as a safely quoted parameter or via a non-shell invocation path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FMRIPREP_DIRis defined but never used in this Snakefile, which can be confusing when maintaining the workflow. Either remove it or use it when wiring rule inputs (e.g., as part of declared inputs for motion/tSNR rules).