Skip to content
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
ff330a3
docs: add refined PRD and design doc for deployment and cross-endpoin…
deanq Feb 21, 2026
8b83f96
feat(protos): add JSON serialization format support to FunctionReques…
deanq Feb 21, 2026
0f906ca
feat(runtime): add resources_endpoints field to Manifest dataclass
deanq Feb 21, 2026
dc5b524
refactor: set FLASH_ENDPOINT_TYPE=lb alongside legacy FLASH_IS_MOTHER…
deanq Feb 21, 2026
e6a7f51
feat(runtime): use JSON serialization for deployed cross-endpoint calls
deanq Feb 21, 2026
b665aba
test: add integration tests for endpoint URL population in deployment
deanq Feb 21, 2026
3b130c7
refactor: rename run_sync to runsync and wrap request body in input e…
deanq Feb 21, 2026
4a44374
fix(scanner): use ignore-aware file walker to fix slow flash run startup
deanq Feb 21, 2026
60c97e3
fix: add missing ServerlessScalerType to top-level exports
deanq Feb 21, 2026
f5bd6c7
test: add integration tests for endpoint URL population
deanq Feb 21, 2026
0e6c417
feat(run): surface docstrings in startup table and Swagger UI
deanq Feb 22, 2026
9414cf9
refactor: remove mothership terminology and stale references
deanq Feb 22, 2026
16f6e69
feat(runtime): add deployed QB handler for plain JSON endpoints
deanq Feb 22, 2026
526bf2b
fix(build): inline deployed handler to avoid runpod_flash import
deanq Feb 22, 2026
706b24f
fix(build): stop bundling flash deps that shadow base image packages
deanq Feb 22, 2026
94e702e
fix(runtime): wrap LB handler params as JSON body instead of query pa…
deanq Feb 22, 2026
1773fc2
fix(runtime): use FLASH_ENVIRONMENT_ID for State Manager queries
deanq Feb 22, 2026
4c45b25
fix(deploy): polish deployment output and reduce log noise
deanq Feb 22, 2026
949678c
fix(deploy): self-contained LB/QB sections with one curl each
deanq Feb 22, 2026
0e1ec78
fix(build): eliminate noisy debug warnings in build pipeline
deanq Feb 23, 2026
658ea76
chore: remove design docs and PRD from feature branch
deanq Feb 23, 2026
4bf420b
fix(runtime): address PR #215 review findings across correctness, err…
deanq Feb 23, 2026
fca872c
fix(deploy): show curl examples for GET routes on LB endpoints
deanq Feb 23, 2026
ed68fa0
fix(runtime): promote execution activity logs from DEBUG to INFO
deanq Feb 23, 2026
f5de96d
feat(runtime): add file upload support for LB handlers
deanq Feb 24, 2026
c315193
feat(cli): support local=True for LB route handlers and dev server fi…
deanq Feb 24, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
284 changes: 233 additions & 51 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,73 +1,255 @@
# {{REPO_NAME}} - {{BRANCH_NAME}} Worktree
# Flash (runpod-flash)

> Auto-generated by /analyze-repos on 2026-02-22. Manual edits will be overwritten on next analysis.

## Project Overview

runpod-flash (v1.3.0, PyPI: `runpod-flash`, MIT, Python >=3.10 <3.15). Python SDK for distributed inference and serving on RunPod serverless. Provides `@remote` decorator, CLI (`flash init/run/build/deploy`), runtime for serialization, endpoint provisioning, cross-endpoint routing, and load-balanced HTTP serving.

Package: `runpod_flash` (src layout). Key deps: cloudpickle, runpod, pydantic>=2.0, rich>=14.0, typer>=0.12.

## Architecture

### Key Abstractions

1. **ServerlessResource** (`src/runpod_flash/core/resources/serverless.py:93`) -- Base Pydantic model for all serverless endpoint configs. Extends DeployableResource with config hashing, drift detection, deployment lifecycle.
2. **ResourceManager** (`src/runpod_flash/core/resources/resource_manager.py:22`) -- Singleton managing dynamic provisioning, persistence, config drift detection, concurrent deployment locking.
3. **ServiceRegistry** (`src/runpod_flash/runtime/service_registry.py:21`) -- Runtime service discovery for cross-endpoint function routing. Loads `flash_manifest.json`, queries State Manager.
4. **remote() decorator** (`src/runpod_flash/client.py:48`) -- Primary public API. Three modes: stub creation, local execution in deployed env, LB route handler.
5. **stub_resource() singledispatch** (`src/runpod_flash/stubs/registry.py:22`) -- Polymorphic stub factory. Dispatches on resource type to create execution stubs.

### Entry Points

- **CLI**: `src/runpod_flash/cli/main.py:30` (`flash = runpod_flash.cli.main:app`). Commands: `init`, `run`, `build`, `deploy`, `env`, `app`, `undeploy`
- **Module entry**: `src/runpod_flash/cli/main.py:89`
- **Programmatic**: `src/runpod_flash/__init__.py` exports `remote`, `LiveServerless`, etc.

### Module Structure

```
src/runpod_flash/
__init__.py # Package root; lazy imports, dotenv, logging
client.py # @remote decorator (3 modes: stub/local/LB)
config.py # FlashPaths NamedTuple
execute_class.py # RemoteClassWrapper, class serialization
logger.py # SensitiveDataFilter, setup_logging()
cli/ # CLI layer (Typer-based)
main.py # CLI app definition
commands/
init.py # flash init
run.py # flash run (dev server codegen + uvicorn)
build.py # flash build (artifact packaging)
deploy.py # flash deploy (build + upload + provision)
preview.py # flash deploy --preview (Docker Compose)
env.py, apps.py, undeploy.py
_run_server_helpers.py # lb_execute, make_input_model
build_utils/
scanner.py # AST-based @remote detection for API key analysis
manifest.py # flash_manifest.json generation
handler_generator.py, lb_handler_generator.py, resource_config_generator.py
utils/ # app.py, conda.py, deployment.py, formatting.py, ignore.py, skeleton.py
core/ # Core business logic
deployment.py # DeploymentOrchestrator
discovery.py # ResourceDiscovery (AST scanning)
exceptions.py, validation.py
api/runpod.py # RunpodGraphQLClient, RunpodRESTClient (899 lines)
resources/ # All resource types, ResourceManager, GPU/CPU enums, constants
utils/ # backoff, constants, file_lock, http, lru_cache, singleton, user_agent
runtime/ # Deployed runtime (runs inside workers)
service_registry.py # Cross-endpoint routing via manifest
production_wrapper.py # Production handler wrapper
lb_handler.py # Load balancer FastAPI handler
circuit_breaker.py, models.py, config.py, context.py
serialization.py, state_manager_client.py
resource_provisioner.py, retry_manager.py, metrics.py, reliability_config.py
stubs/ # Execution dispatch
registry.py # singledispatch stub factory
live_serverless.py, serverless.py, load_balancer_sls.py, dependency_resolver.py
protos/ # FunctionRequest dataclass, protobuf definitions
```

## Public API Surface

**20 symbols via `__all__`:**

| Symbol | Type | Purpose |
|--------|------|---------|
| `remote` | decorator | Primary API -- marks functions for remote execution |
| `LiveServerless` | resource config | GPU serverless endpoint |
| `CpuLiveServerless` | resource config | CPU serverless endpoint |
| `LiveLoadBalancer` | resource config | GPU load-balanced endpoint |
| `CpuLiveLoadBalancer` | resource config | CPU load-balanced endpoint |
| `LoadBalancerSlsResource` | resource config | LB serverless resource |
| `CpuLoadBalancerSlsResource` | resource config | CPU LB serverless resource |
| `ServerlessEndpoint` | resource config | Serverless endpoint |
| `CpuServerlessEndpoint` | resource config | CPU serverless endpoint |
| `GpuGroup` | enum | GPU type groupings |
| `GpuType` | enum | Specific GPU types |
| `CpuInstanceType` | enum | CPU instance types |
| `CudaVersion` | enum | CUDA version selection |
| `DataCenter` | enum | Data center selection |
| `PodTemplate` | model | Pod template config |
| `NetworkVolume` | model | Network volume config |
| `ServerlessScalerType` | enum | Scaler type selection |
| `ServerlessType` | enum | Serverless type selection |
| `ResourceManager` | singleton | Dynamic provisioning manager |
| `FlashApp` | model | Application definition |

**Key environment variables:** `RUNPOD_API_KEY`, `RUNPOD_API_BASE_URL`, `RUNPOD_REST_API_URL`, `RUNPOD_ENDPOINT_ID`, `RUNPOD_POD_ID`, `FLASH_RESOURCE_NAME`, `FLASH_ENVIRONMENT_ID`, `FLASH_IMAGE_TAG`, `FLASH_GPU_IMAGE`, `FLASH_CPU_IMAGE`, `FLASH_LB_IMAGE`, `FLASH_CPU_LB_IMAGE`, `LOG_LEVEL`, `FLASH_FILE_LOGGING_ENABLED`, `FLASH_CIRCUIT_BREAKER_ENABLED`, `FLASH_LB_STRATEGY`, `FLASH_RETRY_ENABLED`, `CONSOLE_BASE_URL`.

## Cross-Repo Dependencies

### Depends On

- **runpod-python** (`runpod` package) -- `runpod.endpoint.runner.Job` (internal path, fragile), `runpod.api_key` (mutable global), `runpod.endpoint_url_base`. Pinned to git main (unstable).

### Depended On By

- **flash-worker** -- imports `FunctionRequest`/`FunctionResponse` protocol from `runpod_flash.protos`, `ServiceRegistry`, `StateManagerClient`
- **flash-examples** -- all 18 worker files import from `runpod_flash`. High risk: `remote` (all break), `LiveServerless` (9 break), `GpuGroup` (7 break)

### Interface Contracts

- **FunctionRequest/FunctionResponse protocol** (`protos/`) -- shared with flash-worker. Field changes require coordinated releases.
- **@remote decorator signature** (`client.py:48`) -- any parameter changes break all flash-examples files.
- **`_hashed_fields` changes** on resource models break drift detection for deployed endpoints.
- **Manifest schema** (`flash_manifest.json`) -- changes break deployed workers that parse it at runtime.
- **Docker image names** hardcoded in `core/resources/constants.py` -- must match flash-worker Docker Hub tags.

### Dependency Chain

```
flash-examples --> flash (runpod_flash) --> runpod-python (runpod)
flash-worker --> flash (protocols) --> runpod-python (serverless.start)
```

### Known Drift

- Python version: runpod-python supports 3.8+, flash requires 3.10+
- Coverage thresholds: runpod-python 90%, flash 35%, flash-examples none
- runpod pinned to git main instead of stable release

## Development Commands

### Setup

```bash
uv venv && source .venv/bin/activate
uv sync --all-groups
```

### Testing

```bash
make test # All tests
make test-unit # Unit tests only
make test-integration # Integration tests only
make test-coverage # Tests with coverage report
```

### Quality

```bash
make quality-check # REQUIRED BEFORE ALL COMMITS (format + lint + typecheck + tests + coverage)
make quality-check-strict # Stricter threshold
make lint # Ruff linter
make lint-fix # Auto-fix lint issues
make format # Ruff formatter
make format-check # Check formatting
make typecheck # mypy type checking
```

> This worktree inherits shared development patterns from main. See: {{MAIN_CLAUDE_MD}}
### Build and Deploy

## Branch Context
```bash
make build # Build wheel/sdist
make validate-wheel # Validate built packages
make dev # Install in dev mode
```

**Purpose:** [Describe the goal of this branch - what feature, fix, or improvement are you implementing?]
### Code Intelligence

**Status:** In development
```bash
make index # Rebuild MCP code intelligence index
make query SYMBOL=name # Query index for a symbol
```

**Related Issues/PRs:** [Link to relevant GitHub issues or PRs]
## Code Health

**Dependencies:**
- [ ] [List any dependencies on other branches or external factors]
### High Severity

## Branch-Specific Configuration
- **20 functions exceed 50 lines.** Worst offenders:
- `_generate_flash_server`: 322 lines (`cli/commands/run.py:268`) -- codegen monolith, needs extraction
- `remote()`: 185 lines (`client.py:48`) -- three modes jammed into one function
- `reconcile_and_provision_resources`: 184 lines -- deployment orchestration
- `install_dependencies`: 179 lines -- build step
- `run_build`: 170 lines -- build orchestration

[Document any configuration unique to this branch:]
- Environment variables needed
- Special test data requirements
- Modified build/deployment settings
- External service configurations
### Medium Severity

## Progress Tracking
- 6 TODOs without assignee/date (`gpu.py`, `serverless.py`, `app.py`, `network_volume.py`, `runpod.py`)
- `ResourceManager` uses class-level mutable dicts (acceptable for singleton pattern but fragile in tests)
- `runpod` dependency pinned to git main -- no stable release guarantee

### Completed
- [ ] [Tasks completed so far]
### Low Severity

### In Progress
- [ ] [Current work items]
- Commented-out code at `cli/main.py:42` and `logger.py:141` -- delete or restore
- No mutable defaults, no bare except, no raw `print()` -- clean

### Next Steps
- [ ] [Upcoming tasks]
## Testing

## Technical Notes
### Structure

[Add branch-specific technical details:]
- Architecture decisions made for this branch
- Implementation approaches tried
- Known issues or limitations
- Performance considerations
- Testing strategy
- `tests/unit/` -- fast, isolated component tests
- `tests/integration/` -- external dependency tests
- `tests/conftest.py` -- shared fixtures
- Coverage threshold: 35% minimum

## Learnings & Discoveries
### Coverage Gaps

[Document insights gained while working on this branch:]
- Unexpected behaviors discovered
- Better approaches found
- Code patterns that worked well
- Areas for future refactoring
These files have zero or near-zero test coverage:

## Merge Checklist
| File | Lines | Risk |
|------|-------|------|
| `core/api/runpod.py` | 899 | HIGH -- all GraphQL/REST calls untested |
| `config.py` | -- | LOW |
| `core/utils/backoff.py` | -- | MEDIUM -- retry logic |
| `core/utils/lru_cache.py` | -- | LOW |
| `core/utils/singleton.py` | -- | LOW |
| `cli/utils/ignore.py` | -- | LOW |
| `cli/utils/app.py` | -- | LOW |
| `cli/utils/conda.py` | -- | LOW |
| `runtime/metrics.py` | -- | MEDIUM |
| `runtime/api_key_context.py` | -- | MEDIUM -- security path |

Before merging this branch:
- [ ] All tests passing locally (`make quality-check`)
- [ ] Test coverage maintained/improved
- [ ] CLAUDE.md updated in main if patterns changed
- [ ] Documentation updated
- [ ] No merge conflicts with main
- [ ] CI/CD passing
- [ ] Code reviewed
- [ ] Migration plan documented (if needed)
### Patterns

## Context for Claude Code
- Arrange-Act-Assert in all tests
- Mock external services (RunPod API), trust internal code
- Both `tests/unit/cli/commands/test_run.py` and `tests/unit/cli/test_run.py` test codegen -- update BOTH when changing `_generate_flash_server` output
- Test `_check_makes_remote_calls()` by writing manifest to `Path.cwd() / "flash_manifest.json"`

[Provide context that helps Claude Code assist more effectively:]
- What should Claude know about this branch's goals?
- What patterns or constraints should be followed?
- What areas need special attention?
### Key Testing Gotchas

---
- `@remote` on classes returns `RemoteClassWrapper` -- `inspect.signature()` gives `*args, **kwargs`, not original params. Fix: use `instance._class_type.method` for the unwrapped signature.
- QB routes only expose `/run_sync` (no `/run`) in dev server
- LB codegen imports both config variable and function: `from module import config_var, func`

## Code Intelligence (MCP)

**Server:** `flash-code-intel`

**Always prefer MCP tools over Grep/Glob for semantic code searches.**

**Note:** This worktree uses the git worktree workflow. See main CLAUDE.md for shared development patterns and quality requirements.
| Tool | Use Case | Example |
|------|----------|---------|
| `find_symbol(symbol)` | Find classes, functions, methods by name | `find_symbol("ResourceManager")` |
| `list_classes()` | Get all classes in codebase | Exploring class hierarchy |
| `get_class_interface(class_name)` | Inspect class methods/properties without reading full file | `get_class_interface("ServerlessResource")` |
| `list_file_symbols(file_path)` | View file structure without reading content | `list_file_symbols("src/runpod_flash/client.py")` |
| `find_by_decorator(decorator)` | Find decorated items | `find_by_decorator("remote")` |

**When to use Grep instead:** Content searches (error messages, string literals, log statements, env var usage).

---
*Last analyzed: 2026-02-22*
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ This template includes:
- Pre-configured worker scaling limits using the `LiveServerless()` object.
- A `@remote` decorated function that returns a response from a worker.

When you run `flash run`, it auto-discovers all `@remote` functions and generates a local development server at `.flash/server.py`. Queue-based workers are exposed at `/{file_prefix}/run_sync` (e.g., `/gpu_worker/run_sync`).
When you run `flash run`, it auto-discovers all `@remote` functions and generates a local development server at `.flash/server.py`. Queue-based workers are exposed at `/{file_prefix}/runsync` (e.g., `/gpu_worker/runsync`).

### Step 3: Install Python dependencies

Expand Down Expand Up @@ -228,9 +228,9 @@ flash run
Open a new terminal tab or window and test your GPU API using cURL:

```bash
curl -X POST http://localhost:8888/gpu_worker/run_sync \
curl -X POST http://localhost:8888/gpu_worker/runsync \
-H "Content-Type: application/json" \
-d '{"message": "Hello from the GPU!"}'
-d '{"input": {"message": "Hello from the GPU!"}}'
```

If you switch back to the terminal tab where you used `flash run`, you'll see the details of the job's progress.
Expand All @@ -253,7 +253,7 @@ Besides starting the API server, `flash run` also starts an interactive API expl

To run remote functions in the explorer:

1. Expand one of the available endpoints (e.g., `/gpu_worker/run_sync`).
1. Expand one of the available endpoints (e.g., `/gpu_worker/runsync`).
2. Click **Try it out** and then **Execute**.

You'll get a response from your workers right in the explorer.
Expand Down
5 changes: 5 additions & 0 deletions src/runpod_flash/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
PodTemplate,
ResourceManager,
ServerlessEndpoint,
ServerlessScalerType,
ServerlessType,
FlashApp,
)
Expand Down Expand Up @@ -58,6 +59,7 @@ def __getattr__(name):
"PodTemplate",
"ResourceManager",
"ServerlessEndpoint",
"ServerlessScalerType",
"ServerlessType",
"FlashApp",
):
Expand All @@ -78,6 +80,7 @@ def __getattr__(name):
PodTemplate,
ResourceManager,
ServerlessEndpoint,
ServerlessScalerType,
ServerlessType,
FlashApp,
)
Expand All @@ -99,6 +102,7 @@ def __getattr__(name):
"PodTemplate": PodTemplate,
"ResourceManager": ResourceManager,
"ServerlessEndpoint": ServerlessEndpoint,
"ServerlessScalerType": ServerlessScalerType,
"ServerlessType": ServerlessType,
"FlashApp": FlashApp,
}
Expand All @@ -124,6 +128,7 @@ def __getattr__(name):
"PodTemplate",
"ResourceManager",
"ServerlessEndpoint",
"ServerlessScalerType",
"ServerlessType",
"FlashApp",
]
Loading
Loading