Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,21 @@ jobs:
with:
credentials_json: ${{ secrets.GCP_SA_KEY }}

# Tests hit real GCS buckets and occasionally get transient 429 rate-limit errors.
# Retry up to 3 times with a 30s backoff, but only when the failure is a 429.
- name: Run tests
env:
SISKIN_ENV_ENABLED: 1
run: |
python -m unittest discover -v cloud_optimized_dicom.tests
for attempt in 1 2 3; do
python -m unittest discover -v cloud_optimized_dicom.tests 2>&1 | tee test_output.txt && exit 0
if grep -q "429" test_output.txt; then
echo "Attempt $attempt failed with 429 rate-limit error, retrying in 30s..."
sleep 30
else
echo "Tests failed for non-rate-limit reasons."
exit 1
fi
done
echo "Tests failed after 3 attempts due to 429 errors."
exit 1
4 changes: 3 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,11 +132,13 @@ cloud_optimized_dicom/
**Core:**
- `pydicom3`: Custom fork with namespace isolation (published package)
- `google-cloud-storage`: GCS operations
- `apache-beam[gcp]`: Data processing (CODObject serialization compatible)
- `ratarmountcore`: Efficient tar file access
- `zstandard`: Metadata compression (v2.0)
- `smart-open`: Unified remote file access

**Optional:**
- `apache-beam[gcp]`: Data processing (CODObject serialization compatible); install with `pip install cloud-optimized-dicom[beam]`. Without Beam, metric counters silently no-op.

**Test:**
- `pydicom==2.3.0`: Original pydicom for validation
- `matplotlib`: Visualization in tests
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,8 @@ SISKIN_ENV_ENABLED=1 python -m unittest discover -v cloud_optimized_dicom.tests
The project uses `pyproject.toml` for package configuration and dependency management. Key dependencies include:
- `pydicom3`: Custom fork of pydicom with namespace isolation
- `google-cloud-storage`: For cloud storage operations
- `apache-beam[gcp]`: For data processing pipelines
- `zstandard`: For metadata compression (v2.0)
- `apache-beam[gcp]` (optional): For data processing pipelines — install with `pip install cloud-optimized-dicom[beam]`

# Concepts & Design Philosophy

Expand Down
4 changes: 3 additions & 1 deletion cloud_optimized_dicom/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
__version__ = "0.2.1"
from importlib.metadata import version

__version__ = version("cloud-optimized-dicom")
__author__ = "Cal Nightingale"
__credits__ = "Gradient Health"
94 changes: 54 additions & 40 deletions cloud_optimized_dicom/metrics.py
Original file line number Diff line number Diff line change
@@ -1,44 +1,64 @@
from apache_beam.metrics import Metrics
try:
from apache_beam.metrics import Metrics

_BEAM_AVAILABLE = True
except ImportError:
_BEAM_AVAILABLE = False

from google.cloud.storage.constants import (
ARCHIVE_STORAGE_CLASS,
COLDLINE_STORAGE_CLASS,
NEARLINE_STORAGE_CLASS,
STANDARD_STORAGE_CLASS,
)


class _NoOpCounter:
"""Drop-in replacement for Beam counters when apache-beam is not installed."""

def inc(self, n=1):
pass


def _counter(namespace, name):
if _BEAM_AVAILABLE:
return Metrics.counter(namespace, name)
return _NoOpCounter()


NAMESPACE = "cloud_optimized_dicom"

# deletion metrics
DELETION_NAMESPACE = f"{NAMESPACE}:deletion"
NUM_DELETES = Metrics.counter(DELETION_NAMESPACE, "num_deletes")
BYTES_DELETED_COUNTER = Metrics.counter(DELETION_NAMESPACE, "bytes_deleted")
DEP_DOES_NOT_EXIST = Metrics.counter(DELETION_NAMESPACE, "dep_does_not_exist")
INSTANCE_BLOB_CRC32C_MISMATCH = Metrics.counter(
NUM_DELETES = _counter(DELETION_NAMESPACE, "num_deletes")
BYTES_DELETED_COUNTER = _counter(DELETION_NAMESPACE, "bytes_deleted")
DEP_DOES_NOT_EXIST = _counter(DELETION_NAMESPACE, "dep_does_not_exist")
INSTANCE_BLOB_CRC32C_MISMATCH = _counter(
DELETION_NAMESPACE, "instance_blob_crc32c_mismatch"
)

# append metrics
APPEND_NAMESPACE = f"{NAMESPACE}:append"
APPEND_CONFLICTS = Metrics.counter(APPEND_NAMESPACE, "append_conflicts")
APPEND_DUPLICATES = Metrics.counter(APPEND_NAMESPACE, "append_duplicates")
APPEND_FAILS = Metrics.counter(APPEND_NAMESPACE, "append_fails")
APPEND_SUCCESSES = Metrics.counter(APPEND_NAMESPACE, "append_successes")
SERIES_DUPE_COUNTER = Metrics.counter(APPEND_NAMESPACE, "num_full_duplicate_series")
TAR_SUCCESS_COUNTER = Metrics.counter(APPEND_NAMESPACE, "tar_success")
TAR_BYTES_PROCESSED = Metrics.counter(APPEND_NAMESPACE, "tar_bytes_processed")
TOTAL_FILES_PROCESSED = Metrics.counter(APPEND_NAMESPACE, "total_files_processed")
APPEND_CONFLICTS = _counter(APPEND_NAMESPACE, "append_conflicts")
APPEND_DUPLICATES = _counter(APPEND_NAMESPACE, "append_duplicates")
APPEND_FAILS = _counter(APPEND_NAMESPACE, "append_fails")
APPEND_SUCCESSES = _counter(APPEND_NAMESPACE, "append_successes")
SERIES_DUPE_COUNTER = _counter(APPEND_NAMESPACE, "num_full_duplicate_series")
TAR_SUCCESS_COUNTER = _counter(APPEND_NAMESPACE, "tar_success")
TAR_BYTES_PROCESSED = _counter(APPEND_NAMESPACE, "tar_bytes_processed")
TOTAL_FILES_PROCESSED = _counter(APPEND_NAMESPACE, "total_files_processed")

# Storage class counters
STD_CREATE_COUNTER = Metrics.counter(__name__, "num_STANDARD_creates")
STD_GET_COUNTER = Metrics.counter(__name__, "num_STANDARD_gets")
NEARLINE_CREATE_COUNTER = Metrics.counter(__name__, "num_NEARLINE_creates")
NEARLINE_GET_COUNTER = Metrics.counter(__name__, "num_NEARLINE_gets")
COLDLINE_CREATE_COUNTER = Metrics.counter(__name__, "num_COLDLINE_creates")
COLDLINE_GET_COUNTER = Metrics.counter(__name__, "num_COLDLINE_gets")
ARCHIVE_CREATE_COUNTER = Metrics.counter(__name__, "num_ARCHIVE_creates")
ARCHIVE_GET_COUNTER = Metrics.counter(__name__, "num_ARCHIVE_gets")
STD_CREATE_COUNTER = _counter(__name__, "num_STANDARD_creates")
STD_GET_COUNTER = _counter(__name__, "num_STANDARD_gets")
NEARLINE_CREATE_COUNTER = _counter(__name__, "num_NEARLINE_creates")
NEARLINE_GET_COUNTER = _counter(__name__, "num_NEARLINE_gets")
COLDLINE_CREATE_COUNTER = _counter(__name__, "num_COLDLINE_creates")
COLDLINE_GET_COUNTER = _counter(__name__, "num_COLDLINE_gets")
ARCHIVE_CREATE_COUNTER = _counter(__name__, "num_ARCHIVE_creates")
ARCHIVE_GET_COUNTER = _counter(__name__, "num_ARCHIVE_gets")
# Storage class counter mappings
STORAGE_CLASS_COUNTERS: dict[str, dict[str, Metrics.DelegatingCounter]] = {
STORAGE_CLASS_COUNTERS: dict[str, dict] = {
"GET": {
STANDARD_STORAGE_CLASS: STD_GET_COUNTER,
NEARLINE_STORAGE_CLASS: NEARLINE_GET_COUNTER,
Expand All @@ -54,27 +74,21 @@
}

# deletion metrics
DEPS_MISSING_FROM_TAR = Metrics.counter(__name__, "deps_missing_from_tar")
TAR_METADATA_CRC32C_MISMATCH = Metrics.counter(__name__, "tar_metadata_crc32c_mismatch")
DEP_DOES_NOT_EXIST = Metrics.counter(__name__, "dep_does_not_exist")
INSTANCE_BLOB_CRC32C_MISMATCH = Metrics.counter(
__name__, "instance_blob_crc32c_mismatch"
)
INSTANCES_NOT_FOUND = Metrics.counter(__name__, "instances_not_found")
NUM_FILES_DELETED = Metrics.counter(__name__, "num_files_deleted")
DEPS_MISSING_FROM_TAR = _counter(__name__, "deps_missing_from_tar")
TAR_METADATA_CRC32C_MISMATCH = _counter(__name__, "tar_metadata_crc32c_mismatch")
DEP_DOES_NOT_EXIST = _counter(__name__, "dep_does_not_exist")
INSTANCE_BLOB_CRC32C_MISMATCH = _counter(__name__, "instance_blob_crc32c_mismatch")
INSTANCES_NOT_FOUND = _counter(__name__, "instances_not_found")
NUM_FILES_DELETED = _counter(__name__, "num_files_deleted")

# thumbnail metrics
THUMBNAIL_NAMESPACE = f"{NAMESPACE}:thumbnail"
SERIES_MISSING_PIXEL_DATA = Metrics.counter(
THUMBNAIL_NAMESPACE, "series_missing_pixel_data"
)
THUMBNAIL_SUCCESSES = Metrics.counter(THUMBNAIL_NAMESPACE, "thumbnail_successes")
THUMBNAIL_FAILS = Metrics.counter(THUMBNAIL_NAMESPACE, "thumbnail_fails")
THUMBNAIL_BYTES_PROCESSED = Metrics.counter(
THUMBNAIL_NAMESPACE, "thumbnail_bytes_processed"
)
SERIES_MISSING_PIXEL_DATA = _counter(THUMBNAIL_NAMESPACE, "series_missing_pixel_data")
THUMBNAIL_SUCCESSES = _counter(THUMBNAIL_NAMESPACE, "thumbnail_successes")
THUMBNAIL_FAILS = _counter(THUMBNAIL_NAMESPACE, "thumbnail_fails")
THUMBNAIL_BYTES_PROCESSED = _counter(THUMBNAIL_NAMESPACE, "thumbnail_bytes_processed")

# metadata caching metrics
METADATA_NAMESPACE = f"{NAMESPACE}:metadata"
METADATA_CACHE_HITS = Metrics.counter(METADATA_NAMESPACE, "cache_hits")
METADATA_CACHE_MISSES = Metrics.counter(METADATA_NAMESPACE, "cache_misses")
METADATA_CACHE_HITS = _counter(METADATA_NAMESPACE, "cache_hits")
METADATA_CACHE_MISSES = _counter(METADATA_NAMESPACE, "cache_misses")
7 changes: 5 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "cloud-optimized-dicom"
version = "0.2.1"
version = "0.2.2"
description = "A library for efficiently storing and interacting with DICOM files in the cloud"
readme = "README.md"
authors = [
Expand All @@ -17,7 +17,6 @@ dependencies = [
"ratarmountcore==0.7.1",
"numpy",
"google-cloud-storage==2.19.0",
"apache-beam[gcp]==2.63.0",
"filetype==1.2.0",
"pylibjpeg==2.0.1",
"pylibjpeg-libjpeg==2.3.0",
Expand All @@ -31,6 +30,9 @@ dependencies = [
]

[project.optional-dependencies]
beam = [
"apache-beam[gcp]==2.63.0",
]
test = [
"pydicom==2.3.0",
"matplotlib",
Expand All @@ -39,6 +41,7 @@ dev = [
"pre-commit>=3.0.0",
"pydicom==2.3.0",
"matplotlib",
"apache-beam[gcp]==2.63.0",
]

[tool.setuptools]
Expand Down