Skip to content

Commit ecde395

Browse files
committed
Merge branch 'main' of https://github.com/CCBR/parkit
2 parents f10e0d6 + 49de4b6 commit ecde395

File tree

14 files changed

+376
-84
lines changed

14 files changed

+376
-84
lines changed

CHANGELOG.md

Lines changed: 44 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,61 @@
1+
## v3.0.1
2+
3+
- Added short options for `projark` subcommands (#45, @kopardev):
4+
- common: `-f` (`--folder`), `-p` (`--projectnumber`), `-d` (`--datatype`)
5+
- deposit: `-t` (`--tarname`), `-s` (`--split-size-gb`), `-k` (`--no-cleanup`)
6+
- retrieve: `-n` (`--filenames`), `-u` (`--unsplit`/`--unspilt`)
7+
- Updated `projectnumber` normalization (#44, @kopardev):
8+
- remove repeated leading `ccbr`/`CCBR` prefixes (with optional `_`/`-`)
9+
- accept any non-empty remainder (for example `CCBR-abcd` -> `abcd`)
10+
- Added absolute-path normalization for `--folder` handling in `projark` (#46 @kopardev):
11+
- relative paths are resolved to absolute paths before use
12+
- trailing slash/non-trailing slash inputs are both supported
13+
- Hardened tar command construction with shell-safe quoting for paths containing spaces/special characters.
14+
- Added ISO 8601 timestamps to `projark` log lines. (#47, @kopardev)
15+
- Added completion/failure email notifications for `projark` (#48, @kopardev):
16+
- recipient: `$USER@nih.gov`
17+
- sender: `NCICCBR@mail.nih.gov`
18+
- Added Open OnDemand graphical session detection in runtime checks (future-facing; `projark` still runs only on Helix today).
19+
- Updated README/MkDocs docs with:
20+
- new short-option usage examples
21+
- current `projectnumber` behavior
22+
- path normalization behavior
23+
- timestamped logging and notification behavior
24+
- Open OnDemand availability disclaimer (Biowulf-only today; not directly available on Helix)
25+
126
## v3.0.0
227

328
- Replaced legacy bash `projark` workflow with a Python-native `projark` CLI (`deposit`, `retrieve`) integrated into the `parkit` package.
429
- Archived old bash implementation at `legacy/projark_legacy.sh`; `src/parkit/scripts/projark` now points to the Python CLI.
530
- Added `projark deposit` end-to-end archival flow with:
6-
- `checkapisync` preflight gate
7-
- Helix host enforcement
8-
- `tmux`/`screen` session enforcement
9-
- project/datatype normalization
10-
- scratch staging, tar/filelist generation, md5 generation
11-
- transfer via `dm_register_directory`
12-
- cleanup enabled by default (`--no-cleanup` to retain artifacts)
31+
- `checkapisync` preflight gate
32+
- Helix host enforcement
33+
- `tmux`/`screen` session enforcement
34+
- project/datatype normalization
35+
- scratch staging, tar/filelist generation, md5 generation
36+
- transfer via `dm_register_directory`
37+
- cleanup enabled by default (`--no-cleanup` to retain artifacts)
1338
- Added configurable tar split threshold/chunk size for deposit: `--split-size-gb` (default `500` GB).
1439
- Added human-readable tar size reporting (MB/GB/TB + bytes) in `projark` output.
1540
- Added `projark retrieve` enhancements:
16-
- selected-file retrieval with `--filenames`
17-
- full-collection retrieval when `--filenames` is omitted (`dm_download_collection`)
18-
- `--unsplit`/`--unspilt` merge support for multiple split tar groups
41+
- selected-file retrieval with `--filenames`
42+
- full-collection retrieval when `--filenames` is omitted (`dm_download_collection`)
43+
- `--unsplit`/`--unspilt` merge support for multiple split tar groups
1944
- Improved `projark` CLI output with stepwise status messages and consistent step numbering.
2045
- Added subcommand version support:
21-
- `projark --version`
22-
- `projark deposit --version`
23-
- `projark retrieve --version`
24-
- all return the same package-aware message
46+
- `projark --version`
47+
- `projark deposit --version`
48+
- `projark retrieve --version`
49+
- all return the same package-aware message
2550
- Suppressed bootstrap Java/environment warnings for version-only invocations.
2651
- Updated `checkapisync` logic to treat merge-history-only divergence as in-sync when local/upstream trees match.
2752
- Reworked docs to versioned MkDocs structure with `projark`-first workflows and updated operational guidance.
2853
- Updated docs/README guidance:
29-
- use `mamba activate ...` directly
30-
- initialize mamba only if not already in `PATH`
31-
- reference HPC_DME setup guide
32-
- document minimum Java requirement: `HPC_DM_JAVA_VERSION >= 23`
33-
- standardize guidance to run all operations in `tmux`/`screen`
54+
- use `mamba activate ...` directly
55+
- initialize mamba only if not already in `PATH`
56+
- reference HPC_DME setup guide
57+
- document minimum Java requirement: `HPC_DM_JAVA_VERSION >= 23`
58+
- standardize guidance to run all operations in `tmux`/`screen`
3459
- Documentation: Improved code example readability in README. (#34, @kelly-sovacool)
3560

3661
## v2.2.0

README.md

Lines changed: 45 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -40,12 +40,24 @@ Archive (`deposit`) example:
4040
projark deposit --folder /data/CCBR/projects/CCBR-12345 --projectnumber 12345 --datatype Analysis
4141
```
4242

43+
Archive (`deposit`) short-flag example:
44+
45+
```bash
46+
projark deposit -f /data/CCBR/projects/CCBR-12345 -p CCBR-12345 -d Analysis -s 250 -k
47+
```
48+
4349
Retrieve selected files example:
4450

4551
```bash
4652
projark retrieve --projectnumber 12345 --datatype Analysis --filenames new.tar_0001,new.tar_0002 --unsplit
4753
```
4854

55+
Retrieve selected files short-flag example:
56+
57+
```bash
58+
projark retrieve -p CCBR-12345 -d Analysis -n new.tar_0001,new.tar_0002 -u
59+
```
60+
4961
Retrieve full collection example (omit `--filenames`):
5062

5163
```bash
@@ -54,15 +66,41 @@ projark retrieve --projectnumber 12345 --datatype Analysis --unsplit
5466

5567
Useful `deposit` flags:
5668

69+
- `-f, --folder <path>`: input folder to archive.
70+
- `-p, --projectnumber <value>`: project identifier.
71+
- `-d, --datatype <Analysis|Rawdata>`: datatype (default `Analysis`).
5772
- `--tarname <name>.tar`: override default tar name.
58-
- `--split-size-gb <N>`: split threshold/chunk size (default `500` GB).
59-
- `--no-cleanup`: keep scratch artifacts after successful transfer.
73+
- `-t, --tarname <name>.tar`: override default tar name.
74+
- `-s, --split-size-gb <N>`: split threshold/chunk size (default `500` GB).
75+
- `-k, --no-cleanup`: keep scratch artifacts after successful transfer.
6076

6177
Useful `retrieve` flags:
6278

63-
- `--filenames a,b,c`: retrieve specific objects.
64-
- `--unsplit` (alias `--unspilt`): merge downloaded split tar parts.
65-
- `--folder /path`: override default local base (`/scratch/$USER/CCBR-<projectnumber>`).
79+
- `-f, --folder /path`: override default local base (`/scratch/$USER/CCBR-<projectnumber>`).
80+
- `-p, --projectnumber <value>`: project identifier.
81+
- `-d, --datatype <Analysis|Rawdata>`: datatype (default `Analysis`).
82+
- `-n, --filenames a,b,c`: retrieve specific objects.
83+
- `-u, --unsplit` (alias `--unspilt`): merge downloaded split tar parts.
84+
85+
`--projectnumber` normalization:
86+
87+
- Accepts any non-empty value.
88+
- Repeated leading prefixes `ccbr`, `CCBR`, `Ccbr` are removed (each may be followed by `_`, `-`, or nothing).
89+
- Examples:
90+
- `CCBR-1234` -> `1234`
91+
- `CCBR-abcd` -> `abcd`
92+
- `ccbr_ccbr-1234abc` -> `1234abc`
93+
94+
Path behavior:
95+
96+
- `--folder FASTQ` and `--folder FASTQ/` are both accepted.
97+
- Relative `--folder` values are resolved to absolute paths before use.
98+
99+
Runtime logging and notifications:
100+
101+
- Log lines are timestamped in ISO 8601 format (for example `2026-02-19T14:37:52-05:00`).
102+
- Completion/failure email is sent to `$USER@nih.gov`.
103+
- Notification sender is `NCICCBR@mail.nih.gov`.
66104

67105
### Prerequisites:
68106

@@ -78,7 +116,8 @@ mamba activate /vf/users/CCBR_Pipeliner/db/PipeDB/miniforge3/envs/parkit
78116

79117
- **HPC_DM_JAVA_VERSION** minimum required value is `23` as of today.
80118

81-
- Run all operations from a `tmux` or `screen` session.
119+
- Run all operations from `tmux`, `screen`, or an Open OnDemand graphical session.
120+
- Disclaimer: Open OnDemand is currently available only on Biowulf compute nodes, not directly on Helix. Since `projark` is Helix-only today, use `tmux`/`screen` on Helix; Open OnDemand support is future-facing until Helix access is available.
82121

83122
- If `mamba` is not already in your `PATH`, add the following block to your `~/.bashrc` or `~/.zshrc`, then run `mamba activate /vf/users/CCBR_Pipeliner/db/PipeDB/miniforge3/envs/parkit`:
84123

docs/cli/projark-deposit.md

Lines changed: 24 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,25 +8,34 @@ Archives a local project folder into HPC-DME collection paths under:
88

99
```bash
1010
projark deposit \
11-
--folder /data/CCBR/projects/CCBR-12345 \
12-
--projectnumber 12345 \
13-
--datatype Analysis
11+
-f /data/CCBR/projects/CCBR-12345 \
12+
-p CCBR-12345 \
13+
-d Analysis
1414
```
1515

1616
## Inputs
1717

18-
- `--folder` (required): local folder to archive.
19-
- `--projectnumber` (required): accepts `1234`, `ccbr1234`, `CCBR-1234`, `ccbr_1234`.
20-
- `--datatype` (optional): `Analysis` (default) or `Rawdata` (case-insensitive).
21-
- `--tarname` (optional): override tar filename (default `ccbr<projectnumber>.tar`).
22-
- `--split-size-gb` (optional): split threshold/chunk size, default `500`.
23-
- `--cleanup` / `--no-cleanup`: cleanup is enabled by default.
18+
- `-f`, `--folder` (required): local folder to archive.
19+
- `-p`, `--projectnumber`, `--project-number` (required): project identifier.
20+
- `-d`, `--datatype` (optional): `Analysis` (default) or `Rawdata` (case-insensitive).
21+
- `-t`, `--tarname` (optional): override tar filename (default `ccbr<projectnumber>.tar`).
22+
- `-s`, `--split-size-gb` (optional): split threshold/chunk size, default `500`.
23+
- `--cleanup` / `-k`, `--no-cleanup`: cleanup is enabled by default.
24+
25+
`--projectnumber` normalization:
26+
27+
- Accepts any non-empty value.
28+
- Repeated leading `ccbr` prefixes are removed (case-insensitive; each may be followed by `_`, `-`, or nothing).
29+
- Examples:
30+
- `CCBR-1234` -> `1234`
31+
- `CCBR-abcd` -> `abcd`
32+
- `ccbr_ccbr-1234abc` -> `1234abc`
2433

2534
## Runtime Behavior
2635

2736
1. Sync gate (`checkapisync`)
2837
2. Helix host check
29-
3. `tmux`/`screen` session check
38+
3. `tmux`/`screen`/Open OnDemand graphical session check
3039
4. Ensure target collections exist (create as needed)
3140
5. Stage tar + filelist in `/scratch/$USER/CCBR-<projectnumber>/<datatype>/`
3241
6. Split tar if above split threshold
@@ -36,5 +45,9 @@ projark deposit \
3645

3746
## Notes
3847

39-
- Run from a `tmux` or `screen` session.
48+
- Run from `tmux`, `screen`, or an Open OnDemand graphical session.
49+
- Disclaimer: Open OnDemand is currently available only on Biowulf compute nodes, not directly on Helix. Since `projark` is Helix-only today, use `tmux`/`screen` on Helix; Open OnDemand support is future-facing until Helix access is available.
50+
- `--folder FASTQ` and `--folder FASTQ/` are both valid for directories.
51+
- Relative folder paths are converted to absolute paths before use.
4052
- For raw data, pass `--datatype rawdata`.
53+
- `projark` sends completion/failure email to `$USER@nih.gov` from `NCICCBR@mail.nih.gov`.

docs/cli/projark-overview.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,11 @@ Both subcommands run preflight checks:
2323

2424
- `parkit checkapisync`
2525
- host must be `helix.nih.gov`
26-
- session must be inside `tmux` or `screen`
26+
- session must be inside `tmux`, `screen`, or an Open OnDemand graphical session
27+
- Disclaimer: Open OnDemand is currently available only on Biowulf compute nodes, not directly on Helix. Since `projark` is Helix-only today, use `tmux`/`screen` on Helix; Open OnDemand support is future-facing until Helix access is available.
28+
29+
## Logging and Notification
30+
31+
- `projark` logs include ISO 8601 timestamps.
32+
- On completion/failure, `projark` sends an email notification to `$USER@nih.gov`.
33+
- Notification sender is `NCICCBR@mail.nih.gov`.

docs/cli/projark-retrieve.md

Lines changed: 21 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -10,35 +10,45 @@ Selected files:
1010

1111
```bash
1212
projark retrieve \
13-
--projectnumber 12345 \
14-
--datatype Analysis \
15-
--filenames new.tar_0001,new.tar_0002 \
16-
--unsplit
13+
-p CCBR-12345 \
14+
-d Analysis \
15+
-n new.tar_0001,new.tar_0002 \
16+
-u
1717
```
1818

1919
Full collection:
2020

2121
```bash
22-
projark retrieve --projectnumber 12345 --unsplit
22+
projark retrieve -p 12345 -u
2323
```
2424

2525
## Inputs
2626

27-
- `--projectnumber` (required)
28-
- `--datatype` (optional, default `Analysis`)
29-
- `--folder` (optional): local base folder (default `/scratch/$USER/CCBR-<projectnumber>`)
30-
- `--filenames` (optional): comma-separated object names; omit for full collection download
31-
- `--unsplit` / `--unspilt`: merge split tar parts after download
27+
- `-p`, `--projectnumber`, `--project-number` (required)
28+
- `-d`, `--datatype` (optional, default `Analysis`)
29+
- `-f`, `--folder` (optional): local base folder (default `/scratch/$USER/CCBR-<projectnumber>`)
30+
- `-n`, `--filenames` (optional): comma-separated object names; omit for full collection download
31+
- `-u`, `--unsplit` / `--unspilt`: merge split tar parts after download
32+
33+
`--projectnumber` normalization:
34+
35+
- Accepts any non-empty value.
36+
- Repeated leading `ccbr` prefixes are removed (case-insensitive; each may be followed by `_`, `-`, or nothing).
3237

3338
## Runtime Behavior
3439

3540
1. Sync gate (`checkapisync`)
3641
2. Helix host check
37-
3. `tmux`/`screen` session check
42+
3. `tmux`/`screen`/Open OnDemand graphical session check
3843
4. Validate source collection exists
3944
5. Download selected objects (`dm_download_dataobject`) or full collection (`dm_download_collection`)
4045
6. Optionally merge `*.tar_0001`, `*.tar_0002`, ... into tar files
4146

4247
## Merge Behavior
4348

4449
`--unsplit` supports multiple split groups in one run.
50+
Disclaimer: Open OnDemand is currently available only on Biowulf compute nodes, not directly on Helix. Since `projark` is Helix-only today, use `tmux`/`screen` on Helix; Open OnDemand support is future-facing until Helix access is available.
51+
52+
`--folder FASTQ` and `--folder FASTQ/` are both valid for directories.
53+
Relative folder paths are converted to absolute paths before use.
54+
`projark` sends completion/failure email to `$USER@nih.gov` from `NCICCBR@mail.nih.gov`.

docs/getting-started.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ parkit syncapi
6060

6161
## Session Safety
6262

63-
Run all operations inside `tmux` or `screen`:
63+
Run all operations inside `tmux`, `screen`, or an Open OnDemand graphical session:
6464

6565
```bash
6666
tmux new -s parkit
@@ -69,3 +69,4 @@ screen -S parkit
6969
```
7070

7171
`projark deposit` and `projark retrieve` enforce this check.
72+
Disclaimer: Open OnDemand is currently available only on Biowulf compute nodes, not directly on Helix. Since `projark` is Helix-only today, use `tmux`/`screen` on Helix; Open OnDemand support is future-facing until Helix access is available.

docs/index.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
For most users, the recommended interface is `projark`, which provides guided `deposit` and `retrieve` workflows for entire CCBR project folder(s).
66

7-
## In This Version (`v3.0.0`)
7+
## In This Version (`v3.0.0-dev`)
88

99
- New Python-native `projark` command with structured subcommands.
1010
- `projark deposit` for project archival with sync/host/session preflight checks.
@@ -29,5 +29,8 @@ projark retrieve --help
2929
## Notes
3030

3131
- `projark` is intended for Helix (`helix.nih.gov`).
32-
- All runs should be executed in `tmux` or `screen`.
33-
- Docs are versioned; this set describes `v3.0.0` behavior.
32+
- All runs should be executed in `tmux`, `screen`, or an Open OnDemand graphical session.
33+
- Disclaimer: Open OnDemand is currently available only on Biowulf compute nodes, not directly on Helix. Since `projark` is Helix-only today, use `tmux`/`screen` on Helix; Open OnDemand support is future-facing until Helix access is available.
34+
- `projark` logs include ISO 8601 timestamps.
35+
- `projark` sends completion/failure email to `$USER@nih.gov` from `NCICCBR@mail.nih.gov`.
36+
- Docs are versioned; this set describes `v3.0.0-dev` behavior.

docs/operations.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,9 @@ tmux new -s parkit
1010
screen -S parkit
1111
```
1212

13+
Open OnDemand graphical sessions are also accepted.
1314
`projark` enforces this for both `deposit` and `retrieve`.
15+
Disclaimer: Open OnDemand is currently available only on Biowulf compute nodes, not directly on Helix. Since `projark` is Helix-only today, use `tmux`/`screen` on Helix; Open OnDemand support is future-facing until Helix access is available.
1416

1517
## Scratch Paths
1618

@@ -34,3 +36,9 @@ Default is `500` GB.
3436

3537
- Default: cleanup enabled after successful deposit.
3638
- To retain artifacts: `--no-cleanup`.
39+
40+
## Logging and Notifications
41+
42+
- `projark` logs include ISO 8601 timestamps.
43+
- On completion/failure, `projark` sends email to `$USER@nih.gov`.
44+
- Notification sender is `NCICCBR@mail.nih.gov`.
Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,21 @@
1-
# Release Notes: v3.0.0
1+
# Release Notes: v3.0.0-dev
22

33
## Highlights
44

55
- Introduced Python-native `projark` command in package.
66
- Added `projark deposit` and `projark retrieve` subcommands.
77
- Added sync preflight gate (`parkit checkapisync`) before operations.
8-
- Added Helix and `tmux`/`screen` preflight checks.
8+
- Added Helix and session preflight checks (`tmux`, `screen`, or Open OnDemand graphical session).
9+
- Clarification: Open OnDemand is currently available only on Biowulf compute nodes, not directly on Helix. Since `projark` is Helix-only today, use `tmux`/`screen` on Helix; Open OnDemand support is future-facing until Helix access is available.
910
- Added configurable deposit split threshold (`--split-size-gb`, default `500`).
1011
- Added full-collection retrieval mode when `--filenames` is omitted.
1112
- Added split-part merge support (`--unsplit`) across multiple tar groups.
1213
- Archived legacy bash `projark` script.
14+
- Added short options for `projark` inputs (for example `-f`, `-p`, `-d`, `-n`, `-u`).
15+
- Added `projectnumber` normalization that strips repeated leading `ccbr` prefixes and accepts any non-empty remainder.
16+
- Added absolute-path normalization for `--folder` inputs.
17+
- Added ISO 8601 timestamps in `projark` logs.
18+
- Added completion/failure email notifications to `$USER@nih.gov` from `NCICCBR@mail.nih.gov`.
1319

1420
## Behavior Notes
1521

docs/troubleshooting.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,8 @@ Then rerun your `projark` command.
1616

1717
## Session Check Failure
1818

19-
If prompted to use `tmux`/`screen`, start one and rerun.
19+
If prompted to use a session wrapper, start `tmux`/`screen` (or use an Open OnDemand graphical session) and rerun.
20+
Disclaimer: Open OnDemand is currently available only on Biowulf compute nodes, not directly on Helix. Since `projark` is Helix-only today, use `tmux`/`screen` on Helix; Open OnDemand support is future-facing until Helix access is available.
2021

2122
## Token/Auth Failures
2223

0 commit comments

Comments
 (0)