Skip to content

feat: mothership manifest sync and caching#140

Merged
deanq merged 19 commits intomainfrom
deanq/ae-1643-mothership-manifest-sync-n-cache
Jan 14, 2026
Merged

feat: mothership manifest sync and caching#140
deanq merged 19 commits intomainfrom
deanq/ae-1643-mothership-manifest-sync-n-cache

Conversation

@deanq
Copy link
Copy Markdown
Member

@deanq deanq commented Jan 12, 2026

Prerequisite: #136 and #139

Summary

  • Add GET /manifest endpoint for mothership service discovery
  • Add ManifestFetcher for caching manifest from RunPod GraphQL API
  • Rename "directory" terminology to "manifest" throughout the codebase for clarity
  • Convert ASCII diagrams to MermaidJS in documentation

Changes

Manifest Fetcher (Caching)

  • Add ManifestFetcher class with caching infrastructure (TTL: 300s)
  • Integrate into lb_handler /manifest endpoint
  • Use RunpodGraphQLClient for API communication
  • Fall back to local flash_manifest.json when API unavailable

Terminology Rename

  • Rename _directory to _endpoint_registry in ServiceRegistry
  • Rename directory_client parameter to manifest_client
  • Change API endpoint from /directory to /manifest
  • Change JSON response key from "directory" to "manifest"
  • Update _ensure_directory_loaded() to _ensure_manifest_loaded()
  • Update refresh_directory() to refresh_manifest()

deanq and others added 7 commits January 11, 2026 22:27
Implement manifest endpoint on LoadBalancer handlers to serve flash_manifest.json
for cross-endpoint routing. The endpoint is conditionally registered when
FLASH_IS_MOTHERSHIP=true environment variable is set, enabling child endpoints
to fetch function/resource metadata from the mothership.

Changes:
- Add /manifest to reserved paths in manifest builder
- Implement conditional GET /manifest endpoint in lb_handler factory
- Returns 200 with manifest JSON on success, 404 if not found
- Endpoint only registers for LoadBalancer resources with env var set
- Add comprehensive unit and integration tests (18 unit, 4 integration)
- Local Execution Flow: Shows synchronous path for functions in manifest
- Remote Execution Flow: Shows serialization, HTTP, and deserialization steps
- Manifest Synchronization: Shows cache-first approach with GQL fallback

Uses high-contrast MermaidJS styling with saturated colors and white text
for maximum readability as per project guidelines.
- Add ManifestFetcher class with caching infrastructure (TTL: 300s)
- Integrate ManifestFetcher into lb_handler /manifest endpoint
- Use RunpodGraphQLClient for API communication
- Fall back to local flash_manifest.json when API unavailable
- Add comprehensive tests for ManifestFetcher and lb_handler
- Rename _directory to _endpoint_registry in ServiceRegistry
- Rename directory_client parameter to manifest_client
- Change API endpoint from /directory to /manifest
- Change JSON response key from "directory" to "manifest"
- Update _ensure_directory_loaded() to _ensure_manifest_loaded()
- Update refresh_directory() to refresh_manifest()
- Update all tests and documentation to reflect new terminology
Remove {"manifest": ...} wrapper and return manifest directly per spec
(Deployment_Architecture.md:235-273). Update ManifestClient parser to expect
manifest directly without unwrap logic.

Changes:
- Remove wrapper from GET /manifest endpoint (lb_handler.py:215)
- Update ManifestClient to validate manifest has "resources" key directly
- Replace global _manifest_fetcher with @lru_cache(maxsize=1) for thread safety
- Update all test assertions to expect unwrapped manifest format

All 636 tests pass, coverage: 66.48%
@deanq deanq changed the base branch from main to deanq/ae-1643-mothership-manifest January 13, 2026 21:25
deanq added 6 commits January 13, 2026 13:28
- Local Execution Flow: Shows synchronous path for functions in manifest
- Remote Execution Flow: Shows serialization, HTTP, and deserialization steps
- Manifest Synchronization: Shows cache-first approach with GQL fallback

Uses high-contrast MermaidJS styling with saturated colors and white text
for maximum readability as per project guidelines.
- Add ManifestFetcher class with caching infrastructure (TTL: 300s)
- Integrate ManifestFetcher into lb_handler /manifest endpoint
- Use RunpodGraphQLClient for API communication
- Fall back to local flash_manifest.json when API unavailable
- Add comprehensive tests for ManifestFetcher and lb_handler
- Rename _directory to _endpoint_registry in ServiceRegistry
- Rename directory_client parameter to manifest_client
- Change API endpoint from /directory to /manifest
- Change JSON response key from "directory" to "manifest"
- Update _ensure_directory_loaded() to _ensure_manifest_loaded()
- Update refresh_directory() to refresh_manifest()
- Update all tests and documentation to reflect new terminology
Remove {"manifest": ...} wrapper and return manifest directly per spec
(Deployment_Architecture.md:235-273). Update ManifestClient parser to expect
manifest directly without unwrap logic.

Changes:
- Remove wrapper from GET /manifest endpoint (lb_handler.py:215)
- Update ManifestClient to validate manifest has "resources" key directly
- Replace global _manifest_fetcher with @lru_cache(maxsize=1) for thread safety
- Update all test assertions to expect unwrapped manifest format

All 636 tests pass, coverage: 66.48%
Removes validation that requires function_code and class_code to be present,
allowing Flash deployment requests where code is pre-deployed in /app.

Changes:
- Remove function_code requirement for execution_type='function'
- Remove class_code requirement for execution_type='class'
- Add documentation explaining optional fields for Flash deployments

This enables dual-mode runtime where the same handler serves both:
- Live Serverless (with code in request)
- Flash Deployed Apps (without code in request)
@deanq deanq requested a review from Copilot January 14, 2026 09:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

@jhcipar jhcipar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just need to update the gql logic

Base automatically changed from deanq/ae-1643-mothership-manifest to main January 14, 2026 20:32
deanq added 4 commits January 14, 2026 13:08
Resolved merge conflicts with hybrid approach:
- Keep "manifest" terminology and method names (get_manifest vs get_directory)
- Adopt FLASH_MOTHERSHIP_ID env var from main (instead of FLASH_MOTHERSHIP_URL)
- Adopt FLASH_RESOURCE_NAME with RUNPOD_ENDPOINT_ID fallback from main
- Document both ManifestClient and StateManagerClient
- Parameter naming: manifest_client (not directory_client)
- Internal variable: _endpoint_registry (not _directory)

Key files resolved:
- src/tetra_rp/runtime/manifest_client.py: Keep get_manifest(), use FLASH_MOTHERSHIP_ID
- src/tetra_rp/runtime/service_registry.py: Keep manifest_client param, add FLASH_RESOURCE_NAME support
- tests/unit/runtime/test_service_registry.py: Update tests for FLASH_RESOURCE_NAME
- docs/Cross_Endpoint_Routing.md: Update env vars, add StateManagerClient docs
- Replace FLASH_MOTHERSHIP_URL with FLASH_MOTHERSHIP_ID in integration tests
- Update tests to use FLASH_RESOURCE_NAME (with RUNPOD_ENDPOINT_ID fallback)
- Apply ruff formatting to service_registry.py
- All quality checks passing (706 tests, 63.52% coverage)
- Add missing manifest_client and cache_ttl parameters to __init__ docs
- Document FLASH_RESOURCE_NAME and RUNPOD_ENDPOINT_ID env vars in docstring
- Show _current_endpoint initialization logic
- Match actual code implementation exactly
@deanq deanq merged commit 20490ea into main Jan 14, 2026
7 checks passed
@deanq deanq deleted the deanq/ae-1643-mothership-manifest-sync-n-cache branch January 14, 2026 23:43
This was referenced Feb 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants