This document explains how a prompt travels from the PromptForge SDK in a client application to the Backend API, and eventually to the AI Engine.
When you call sdk.execute({ versionId: "..." }) in your code:
- Client Initialization: The
PromptForgeClientstores yourapiKeyandbaseUrl. - Request Formation: It wraps your
versionIdandvariablesinto a POST request. - Header Injection: It attaches your API Key to the
x-api-keyheader. - Handling Results: Once the API responds, it converts the backend's
snake_caseJSON (e.g.,latency_ms) into cleanCamelCasefor your TypeScript code (latencyMs).
This is the entry point for all programmatic prompt executions.
- API Key Validation: The system checks if the key exists in the
v2_api_keystable and if it's currently active. - Rate Limiting: It enforces a limit (currently 120 req/min) using a Redis-backed rate limiter to prevent abuse.
- Payload Validation: Uses
Zodto ensure theversion_idis a valid UUID before doing any database work.
The API doesn't store the AI's response; it stores the Prompt Template.
- V2 Search: It looks for a saved prompt in
v2_prompt_versionsthat matches the ID AND is owned by the workspace associated with your API Key. - Published Lock: It only allows execution of prompts marked as
published: trueto prevent accidental execution of draft code. - V1 Fallback: If not found in V2, it checks the legacy
promptstable (Playground history) so your old prompt IDs don't break.
Once the template is retrieved:
The system takes the static template and replaces placeholders like {{name}} with the variables you provided in the SDK.
Before calling the expensive AI model, it generates a hash of:
Prompt ID + System Instruction + Variables
If this exact combination was run recently, it returns the Cached Result in < 50ms, saving you model costs and time.
If no cache is found:
- Model Fallback: It tries your preferred model (e.g., Gemini 3.1 Pro).
- Auto-Switching: If that model is busy (Capacity issue) or your specific API Key for that model is exhausted, it automatically tries the next best model (e.g., Gemini 1.5 Flash) until it succeeds.
After the response is sent back to you:
- Logging: The system logs the execution to
v2_execution_logs. - Metrics: It records latency, token count (input/output), and the calculated USD cost.
- Cost Tracking: This data powers the "Usage & Costs" charts in your Dashboard.
sequenceDiagram
participant UserApp as Application (SDK)
participant API as PF API (/v1/execute)
participant Redis as Redis (Cache/RateLimit)
participant DB as Supabase (Prompts/Logs)
participant AI as Gemini Engine
UserApp->>API: POST /v1/execute (API Key + ID)
API->>Redis: Check Rate Limit
API->>DB: Fetch Prompt Template
API->>Redis: Check Prompt Cache
Note over API,AI: If Cache Miss...
API->>AI: Generate Content
AI-->>API: Response (Tokens + Content)
API->>DB: Insert Execution Log (Async)
API-->>UserApp: JSON Result (Success: true)