runpod · deanq · Feb 24, 2026 · Feb 21, 2026 · Feb 21, 2026 · Feb 21, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -1,73 +1,255 @@
-# {{REPO_NAME}} - {{BRANCH_NAME}} Worktree
+# Flash (runpod-flash)
+
+> Auto-generated by /analyze-repos on 2026-02-22. Manual edits will be overwritten on next analysis.
+
+## Project Overview
+
+runpod-flash (v1.3.0, PyPI: `runpod-flash`, MIT, Python >=3.10 <3.15). Python SDK for distributed inference and serving on RunPod serverless. Provides `@remote` decorator, CLI (`flash init/run/build/deploy`), runtime for serialization, endpoint provisioning, cross-endpoint routing, and load-balanced HTTP serving.
+
+Package: `runpod_flash` (src layout). Key deps: cloudpickle, runpod, pydantic>=2.0, rich>=14.0, typer>=0.12.
+
+## Architecture
+
+### Key Abstractions
+
+1. **ServerlessResource** (`src/runpod_flash/core/resources/serverless.py:93`) -- Base Pydantic model for all serverless endpoint configs. Extends DeployableResource with config hashing, drift detection, deployment lifecycle.
+2. **ResourceManager** (`src/runpod_flash/core/resources/resource_manager.py:22`) -- Singleton managing dynamic provisioning, persistence, config drift detection, concurrent deployment locking.
+3. **ServiceRegistry** (`src/runpod_flash/runtime/service_registry.py:21`) -- Runtime service discovery for cross-endpoint function routing. Loads `flash_manifest.json`, queries State Manager.
+4. **remote() decorator** (`src/runpod_flash/client.py:48`) -- Primary public API. Three modes: stub creation, local execution in deployed env, LB route handler.
+5. **stub_resource() singledispatch** (`src/runpod_flash/stubs/registry.py:22`) -- Polymorphic stub factory. Dispatches on resource type to create execution stubs.
+
+### Entry Points
+
+- **CLI**: `src/runpod_flash/cli/main.py:30` (`flash = runpod_flash.cli.main:app`). Commands: `init`, `run`, `build`, `deploy`, `env`, `app`, `undeploy`
+- **Module entry**: `src/runpod_flash/cli/main.py:89`
+- **Programmatic**: `src/runpod_flash/__init__.py` exports `remote`, `LiveServerless`, etc.
+
+### Module Structure
+
+```
+src/runpod_flash/
+  __init__.py              # Package root; lazy imports, dotenv, logging
+  client.py                # @remote decorator (3 modes: stub/local/LB)
+  config.py                # FlashPaths NamedTuple
+  execute_class.py         # RemoteClassWrapper, class serialization
+  logger.py                # SensitiveDataFilter, setup_logging()
+  cli/                     # CLI layer (Typer-based)
+    main.py                # CLI app definition
+    commands/
+      init.py              # flash init
+      run.py               # flash run (dev server codegen + uvicorn)
+      build.py             # flash build (artifact packaging)
+      deploy.py            # flash deploy (build + upload + provision)
+      preview.py           # flash deploy --preview (Docker Compose)
+      env.py, apps.py, undeploy.py
+      _run_server_helpers.py  # lb_execute, make_input_model
+      build_utils/
+        scanner.py         # AST-based @remote detection for API key analysis
+        manifest.py        # flash_manifest.json generation
+        handler_generator.py, lb_handler_generator.py, resource_config_generator.py
+    utils/                 # app.py, conda.py, deployment.py, formatting.py, ignore.py, skeleton.py
+  core/                    # Core business logic
+    deployment.py          # DeploymentOrchestrator
+    discovery.py           # ResourceDiscovery (AST scanning)
+    exceptions.py, validation.py
+    api/runpod.py          # RunpodGraphQLClient, RunpodRESTClient (899 lines)
+    resources/             # All resource types, ResourceManager, GPU/CPU enums, constants
+    utils/                 # backoff, constants, file_lock, http, lru_cache, singleton, user_agent
+  runtime/                 # Deployed runtime (runs inside workers)
+    service_registry.py    # Cross-endpoint routing via manifest
+    production_wrapper.py  # Production handler wrapper
+    lb_handler.py          # Load balancer FastAPI handler
+    circuit_breaker.py, models.py, config.py, context.py
+    serialization.py, state_manager_client.py
+    resource_provisioner.py, retry_manager.py, metrics.py, reliability_config.py
+  stubs/                   # Execution dispatch
+    registry.py            # singledispatch stub factory
+    live_serverless.py, serverless.py, load_balancer_sls.py, dependency_resolver.py
+  protos/                  # FunctionRequest dataclass, protobuf definitions
+```
+
+## Public API Surface
+
+**20 symbols via `__all__`:**
+
+| Symbol | Type | Purpose |
+|--------|------|---------|
+| `remote` | decorator | Primary API -- marks functions for remote execution |
+| `LiveServerless` | resource config | GPU serverless endpoint |
+| `CpuLiveServerless` | resource config | CPU serverless endpoint |
+| `LiveLoadBalancer` | resource config | GPU load-balanced endpoint |
+| `CpuLiveLoadBalancer` | resource config | CPU load-balanced endpoint |
+| `LoadBalancerSlsResource` | resource config | LB serverless resource |
+| `CpuLoadBalancerSlsResource` | resource config | CPU LB serverless resource |
+| `ServerlessEndpoint` | resource config | Serverless endpoint |
+| `CpuServerlessEndpoint` | resource config | CPU serverless endpoint |
+| `GpuGroup` | enum | GPU type groupings |
+| `GpuType` | enum | Specific GPU types |
+| `CpuInstanceType` | enum | CPU instance types |
+| `CudaVersion` | enum | CUDA version selection |
+| `DataCenter` | enum | Data center selection |
+| `PodTemplate` | model | Pod template config |
+| `NetworkVolume` | model | Network volume config |
+| `ServerlessScalerType` | enum | Scaler type selection |
+| `ServerlessType` | enum | Serverless type selection |
+| `ResourceManager` | singleton | Dynamic provisioning manager |
+| `FlashApp` | model | Application definition |
+
+**Key environment variables:** `RUNPOD_API_KEY`, `RUNPOD_API_BASE_URL`, `RUNPOD_REST_API_URL`, `RUNPOD_ENDPOINT_ID`, `RUNPOD_POD_ID`, `FLASH_RESOURCE_NAME`, `FLASH_ENVIRONMENT_ID`, `FLASH_IMAGE_TAG`, `FLASH_GPU_IMAGE`, `FLASH_CPU_IMAGE`, `FLASH_LB_IMAGE`, `FLASH_CPU_LB_IMAGE`, `LOG_LEVEL`, `FLASH_FILE_LOGGING_ENABLED`, `FLASH_CIRCUIT_BREAKER_ENABLED`, `FLASH_LB_STRATEGY`, `FLASH_RETRY_ENABLED`, `CONSOLE_BASE_URL`.
+
+## Cross-Repo Dependencies
+
+### Depends On
+
+- **runpod-python** (`runpod` package) -- `runpod.endpoint.runner.Job` (internal path, fragile), `runpod.api_key` (mutable global), `runpod.endpoint_url_base`. Pinned to git main (unstable).
+
+### Depended On By
+
+- **flash-worker** -- imports `FunctionRequest`/`FunctionResponse` protocol from `runpod_flash.protos`, `ServiceRegistry`, `StateManagerClient`
+- **flash-examples** -- all 18 worker files import from `runpod_flash`. High risk: `remote` (all break), `LiveServerless` (9 break), `GpuGroup` (7 break)
+
+### Interface Contracts
+
+- **FunctionRequest/FunctionResponse protocol** (`protos/`) -- shared with flash-worker. Field changes require coordinated releases.
+- **@remote decorator signature** (`client.py:48`) -- any parameter changes break all flash-examples files.
+- **`_hashed_fields` changes** on resource models break drift detection for deployed endpoints.
+- **Manifest schema** (`flash_manifest.json`) -- changes break deployed workers that parse it at runtime.
+- **Docker image names** hardcoded in `core/resources/constants.py` -- must match flash-worker Docker Hub tags.
+
+### Dependency Chain
+
+```
+flash-examples --> flash (runpod_flash) --> runpod-python (runpod)
+flash-worker   --> flash (protocols)    --> runpod-python (serverless.start)
+```
+
+### Known Drift
+
+- Python version: runpod-python supports 3.8+, flash requires 3.10+
+- Coverage thresholds: runpod-python 90%, flash 35%, flash-examples none
+- runpod pinned to git main instead of stable release
+
+## Development Commands
+
+### Setup
+
+```bash
+uv venv && source .venv/bin/activate
+uv sync --all-groups
+```
+
+### Testing
+
+```bash
+make test                 # All tests
+make test-unit            # Unit tests only
+make test-integration     # Integration tests only
+make test-coverage        # Tests with coverage report
+```
+
+### Quality
+
+```bash
+make quality-check        # REQUIRED BEFORE ALL COMMITS (format + lint + typecheck + tests + coverage)
+make quality-check-strict # Stricter threshold
+make lint                 # Ruff linter
+make lint-fix             # Auto-fix lint issues
+make format               # Ruff formatter
+make format-check         # Check formatting
+make typecheck            # mypy type checking
+```
 
-> This worktree inherits shared development patterns from main. See: {{MAIN_CLAUDE_MD}}
+### Build and Deploy
 
-## Branch Context
+```bash
+make build                # Build wheel/sdist
+make validate-wheel       # Validate built packages
+make dev                  # Install in dev mode
+```
 
-**Purpose:** [Describe the goal of this branch - what feature, fix, or improvement are you implementing?]
+### Code Intelligence
 
-**Status:** In development
+```bash
+make index                # Rebuild MCP code intelligence index
+make query SYMBOL=name    # Query index for a symbol
+```
 
-**Related Issues/PRs:** [Link to relevant GitHub issues or PRs]
+## Code Health
 
-**Dependencies:**
-- [ ] [List any dependencies on other branches or external factors]
+### High Severity
 
-## Branch-Specific Configuration
+- **20 functions exceed 50 lines.** Worst offenders:
+  - `_generate_flash_server`: 322 lines (`cli/commands/run.py:268`) -- codegen monolith, needs extraction
+  - `remote()`: 185 lines (`client.py:48`) -- three modes jammed into one function
+  - `reconcile_and_provision_resources`: 184 lines -- deployment orchestration
+  - `install_dependencies`: 179 lines -- build step
+  - `run_build`: 170 lines -- build orchestration
 
-[Document any configuration unique to this branch:]
-- Environment variables needed
-- Special test data requirements
-- Modified build/deployment settings
-- External service configurations
+### Medium Severity
 
-## Progress Tracking
+- 6 TODOs without assignee/date (`gpu.py`, `serverless.py`, `app.py`, `network_volume.py`, `runpod.py`)
+- `ResourceManager` uses class-level mutable dicts (acceptable for singleton pattern but fragile in tests)
+- `runpod` dependency pinned to git main -- no stable release guarantee
 
-### Completed
-- [ ] [Tasks completed so far]
+### Low Severity
 
-### In Progress
-- [ ] [Current work items]
+- Commented-out code at `cli/main.py:42` and `logger.py:141` -- delete or restore
+- No mutable defaults, no bare except, no raw `print()` -- clean
 
-### Next Steps
-- [ ] [Upcoming tasks]
+## Testing
 
-## Technical Notes
+### Structure
 
-[Add branch-specific technical details:]
-- Architecture decisions made for this branch
-- Implementation approaches tried
-- Known issues or limitations
-- Performance considerations
-- Testing strategy
+- `tests/unit/` -- fast, isolated component tests
+- `tests/integration/` -- external dependency tests
+- `tests/conftest.py` -- shared fixtures
+- Coverage threshold: 35% minimum
 
-## Learnings & Discoveries
+### Coverage Gaps
 
-[Document insights gained while working on this branch:]
-- Unexpected behaviors discovered
-- Better approaches found
-- Code patterns that worked well
-- Areas for future refactoring
+These files have zero or near-zero test coverage:
 
-## Merge Checklist
+| File | Lines | Risk |
+|------|-------|------|
+| `core/api/runpod.py` | 899 | HIGH -- all GraphQL/REST calls untested |
+| `config.py` | -- | LOW |
+| `core/utils/backoff.py` | -- | MEDIUM -- retry logic |
+| `core/utils/lru_cache.py` | -- | LOW |
+| `core/utils/singleton.py` | -- | LOW |
+| `cli/utils/ignore.py` | -- | LOW |
+| `cli/utils/app.py` | -- | LOW |
+| `cli/utils/conda.py` | -- | LOW |
+| `runtime/metrics.py` | -- | MEDIUM |
+| `runtime/api_key_context.py` | -- | MEDIUM -- security path |
 
-Before merging this branch:
-- [ ] All tests passing locally (`make quality-check`)
-- [ ] Test coverage maintained/improved
-- [ ] CLAUDE.md updated in main if patterns changed
-- [ ] Documentation updated
-- [ ] No merge conflicts with main
-- [ ] CI/CD passing
-- [ ] Code reviewed
-- [ ] Migration plan documented (if needed)
+### Patterns
 
-## Context for Claude Code
+- Arrange-Act-Assert in all tests
+- Mock external services (RunPod API), trust internal code
+- Both `tests/unit/cli/commands/test_run.py` and `tests/unit/cli/test_run.py` test codegen -- update BOTH when changing `_generate_flash_server` output
+- Test `_check_makes_remote_calls()` by writing manifest to `Path.cwd() / "flash_manifest.json"`
 
-[Provide context that helps Claude Code assist more effectively:]
-- What should Claude know about this branch's goals?
-- What patterns or constraints should be followed?
-- What areas need special attention?
+### Key Testing Gotchas
 
----
+- `@remote` on classes returns `RemoteClassWrapper` -- `inspect.signature()` gives `*args, **kwargs`, not original params. Fix: use `instance._class_type.method` for the unwrapped signature.
+- QB routes only expose `/run_sync` (no `/run`) in dev server
+- LB codegen imports both config variable and function: `from module import config_var, func`
+
+## Code Intelligence (MCP)
+
+**Server:** `flash-code-intel`
+
+**Always prefer MCP tools over Grep/Glob for semantic code searches.**
 
-**Note:** This worktree uses the git worktree workflow. See main CLAUDE.md for shared development patterns and quality requirements.
+| Tool | Use Case | Example |
+|------|----------|---------|
+| `find_symbol(symbol)` | Find classes, functions, methods by name | `find_symbol("ResourceManager")` |
+| `list_classes()` | Get all classes in codebase | Exploring class hierarchy |
+| `get_class_interface(class_name)` | Inspect class methods/properties without reading full file | `get_class_interface("ServerlessResource")` |
+| `list_file_symbols(file_path)` | View file structure without reading content | `list_file_symbols("src/runpod_flash/client.py")` |
+| `find_by_decorator(decorator)` | Find decorated items | `find_by_decorator("remote")` |
+
+**When to use Grep instead:** Content searches (error messages, string literals, log statements, env var usage).
+
+---
+*Last analyzed: 2026-02-22*
diff --git a/README.md b/README.md
@@ -179,7 +179,7 @@ This template includes:
     - Pre-configured worker scaling limits using the `LiveServerless()` object.
     - A `@remote` decorated function that returns a response from a worker.
 
-When you run `flash run`, it auto-discovers all `@remote` functions and generates a local development server at `.flash/server.py`. Queue-based workers are exposed at `/{file_prefix}/run_sync` (e.g., `/gpu_worker/run_sync`).
+When you run `flash run`, it auto-discovers all `@remote` functions and generates a local development server at `.flash/server.py`. Queue-based workers are exposed at `/{file_prefix}/runsync` (e.g., `/gpu_worker/runsync`).
 
 ### Step 3: Install Python dependencies
 
@@ -228,9 +228,9 @@ flash run
 Open a new terminal tab or window and test your GPU API using cURL:
 
 ```bash
-curl -X POST http://localhost:8888/gpu_worker/run_sync \
+curl -X POST http://localhost:8888/gpu_worker/runsync \
     -H "Content-Type: application/json" \
-    -d '{"message": "Hello from the GPU!"}'
+    -d '{"input": {"message": "Hello from the GPU!"}}'
 ```
 
 If you switch back to the terminal tab where you used `flash run`, you'll see the details of the job's progress.
@@ -253,7 +253,7 @@ Besides starting the API server, `flash run` also starts an interactive API expl
 
 To run remote functions in the explorer:
 
-1. Expand one of the available endpoints (e.g., `/gpu_worker/run_sync`).
+1. Expand one of the available endpoints (e.g., `/gpu_worker/runsync`).
 2. Click **Try it out** and then **Execute**.
 
 You'll get a response from your workers right in the explorer.

diff --git a/src/runpod_flash/__init__.py b/src/runpod_flash/__init__.py
@@ -30,6 +30,7 @@
         PodTemplate,
         ResourceManager,
         ServerlessEndpoint,
+        ServerlessScalerType,
         ServerlessType,
         FlashApp,
     )
@@ -58,6 +59,7 @@ def __getattr__(name):
         "PodTemplate",
         "ResourceManager",
         "ServerlessEndpoint",
+        "ServerlessScalerType",
         "ServerlessType",
         "FlashApp",
     ):
@@ -78,6 +80,7 @@ def __getattr__(name):
             PodTemplate,
             ResourceManager,
             ServerlessEndpoint,
+            ServerlessScalerType,
             ServerlessType,
             FlashApp,
         )
@@ -99,6 +102,7 @@ def __getattr__(name):
             "PodTemplate": PodTemplate,
             "ResourceManager": ResourceManager,
             "ServerlessEndpoint": ServerlessEndpoint,
+            "ServerlessScalerType": ServerlessScalerType,
             "ServerlessType": ServerlessType,
             "FlashApp": FlashApp,
         }
@@ -124,6 +128,7 @@ def __getattr__(name):
     "PodTemplate",
     "ResourceManager",
     "ServerlessEndpoint",
+    "ServerlessScalerType",
     "ServerlessType",
     "FlashApp",
 ]