feat: mothership manifest sync and caching#140
Merged
Conversation
Implement manifest endpoint on LoadBalancer handlers to serve flash_manifest.json for cross-endpoint routing. The endpoint is conditionally registered when FLASH_IS_MOTHERSHIP=true environment variable is set, enabling child endpoints to fetch function/resource metadata from the mothership. Changes: - Add /manifest to reserved paths in manifest builder - Implement conditional GET /manifest endpoint in lb_handler factory - Returns 200 with manifest JSON on success, 404 if not found - Endpoint only registers for LoadBalancer resources with env var set - Add comprehensive unit and integration tests (18 unit, 4 integration)
- Local Execution Flow: Shows synchronous path for functions in manifest - Remote Execution Flow: Shows serialization, HTTP, and deserialization steps - Manifest Synchronization: Shows cache-first approach with GQL fallback Uses high-contrast MermaidJS styling with saturated colors and white text for maximum readability as per project guidelines.
- Add ManifestFetcher class with caching infrastructure (TTL: 300s) - Integrate ManifestFetcher into lb_handler /manifest endpoint - Use RunpodGraphQLClient for API communication - Fall back to local flash_manifest.json when API unavailable - Add comprehensive tests for ManifestFetcher and lb_handler
- Rename _directory to _endpoint_registry in ServiceRegistry - Rename directory_client parameter to manifest_client - Change API endpoint from /directory to /manifest - Change JSON response key from "directory" to "manifest" - Update _ensure_directory_loaded() to _ensure_manifest_loaded() - Update refresh_directory() to refresh_manifest() - Update all tests and documentation to reflect new terminology
Remove {"manifest": ...} wrapper and return manifest directly per spec
(Deployment_Architecture.md:235-273). Update ManifestClient parser to expect
manifest directly without unwrap logic.
Changes:
- Remove wrapper from GET /manifest endpoint (lb_handler.py:215)
- Update ManifestClient to validate manifest has "resources" key directly
- Replace global _manifest_fetcher with @lru_cache(maxsize=1) for thread safety
- Update all test assertions to expect unwrapped manifest format
All 636 tests pass, coverage: 66.48%
- Local Execution Flow: Shows synchronous path for functions in manifest - Remote Execution Flow: Shows serialization, HTTP, and deserialization steps - Manifest Synchronization: Shows cache-first approach with GQL fallback Uses high-contrast MermaidJS styling with saturated colors and white text for maximum readability as per project guidelines.
- Add ManifestFetcher class with caching infrastructure (TTL: 300s) - Integrate ManifestFetcher into lb_handler /manifest endpoint - Use RunpodGraphQLClient for API communication - Fall back to local flash_manifest.json when API unavailable - Add comprehensive tests for ManifestFetcher and lb_handler
- Rename _directory to _endpoint_registry in ServiceRegistry - Rename directory_client parameter to manifest_client - Change API endpoint from /directory to /manifest - Change JSON response key from "directory" to "manifest" - Update _ensure_directory_loaded() to _ensure_manifest_loaded() - Update refresh_directory() to refresh_manifest() - Update all tests and documentation to reflect new terminology
Remove {"manifest": ...} wrapper and return manifest directly per spec
(Deployment_Architecture.md:235-273). Update ManifestClient parser to expect
manifest directly without unwrap logic.
Changes:
- Remove wrapper from GET /manifest endpoint (lb_handler.py:215)
- Update ManifestClient to validate manifest has "resources" key directly
- Replace global _manifest_fetcher with @lru_cache(maxsize=1) for thread safety
- Update all test assertions to expect unwrapped manifest format
All 636 tests pass, coverage: 66.48%
…s://github.com/runpod/tetra-rp into deanq/ae-1643-mothership-manifest-sync-n-cache
Removes validation that requires function_code and class_code to be present, allowing Flash deployment requests where code is pre-deployed in /app. Changes: - Remove function_code requirement for execution_type='function' - Remove class_code requirement for execution_type='class' - Add documentation explaining optional fields for Flash deployments This enables dual-mode runtime where the same handler serves both: - Live Serverless (with code in request) - Flash Deployed Apps (without code in request)
…othership-manifest-sync-n-cache
jhcipar
approved these changes
Jan 14, 2026
…hip-manifest-sync-n-cache
Resolved merge conflicts with hybrid approach: - Keep "manifest" terminology and method names (get_manifest vs get_directory) - Adopt FLASH_MOTHERSHIP_ID env var from main (instead of FLASH_MOTHERSHIP_URL) - Adopt FLASH_RESOURCE_NAME with RUNPOD_ENDPOINT_ID fallback from main - Document both ManifestClient and StateManagerClient - Parameter naming: manifest_client (not directory_client) - Internal variable: _endpoint_registry (not _directory) Key files resolved: - src/tetra_rp/runtime/manifest_client.py: Keep get_manifest(), use FLASH_MOTHERSHIP_ID - src/tetra_rp/runtime/service_registry.py: Keep manifest_client param, add FLASH_RESOURCE_NAME support - tests/unit/runtime/test_service_registry.py: Update tests for FLASH_RESOURCE_NAME - docs/Cross_Endpoint_Routing.md: Update env vars, add StateManagerClient docs
- Replace FLASH_MOTHERSHIP_URL with FLASH_MOTHERSHIP_ID in integration tests - Update tests to use FLASH_RESOURCE_NAME (with RUNPOD_ENDPOINT_ID fallback) - Apply ruff formatting to service_registry.py - All quality checks passing (706 tests, 63.52% coverage)
- Add missing manifest_client and cache_ttl parameters to __init__ docs - Document FLASH_RESOURCE_NAME and RUNPOD_ENDPOINT_ID env vars in docstring - Show _current_endpoint initialization logic - Match actual code implementation exactly
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes
Manifest Fetcher (Caching)
ManifestFetcherclass with caching infrastructure (TTL: 300s)/manifestendpointRunpodGraphQLClientfor API communicationflash_manifest.jsonwhen API unavailableTerminology Rename
_directoryto_endpoint_registryin ServiceRegistrydirectory_clientparameter tomanifest_client/directoryto/manifest"directory"to"manifest"_ensure_directory_loaded()to_ensure_manifest_loaded()refresh_directory()torefresh_manifest()