ryan-crabbe
diff --git a/‎deploy/charts/litellm-helm/templates/deployment.yaml‎
Lines changed: 4 additions & 0 deletions b/‎deploy/charts/litellm-helm/templates/deployment.yaml‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎deploy/charts/litellm-helm/templates/migrations-job.yaml‎
Lines changed: 4 additions & 0 deletions b/‎deploy/charts/litellm-helm/templates/migrations-job.yaml‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎deploy/charts/litellm-helm/values.yaml‎
Lines changed: 1 addition & 0 deletions b/‎deploy/charts/litellm-helm/values.yaml‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/my-website/docs/providers/gemini.md‎
Lines changed: 51 additions & 0 deletions b/‎docs/my-website/docs/providers/gemini.md‎
Lines changed: 51 additions & 0 deletions
diff --git a/‎docs/my-website/docs/providers/sarvam.md‎
Lines changed: 89 additions & 0 deletions b/‎docs/my-website/docs/providers/sarvam.md‎
Lines changed: 89 additions & 0 deletions
diff --git a/‎docs/my-website/docs/proxy/guardrails/quick_start.md‎
Lines changed: 1 addition & 5 deletions b/‎docs/my-website/docs/proxy/guardrails/quick_start.md‎
Lines changed: 1 addition & 5 deletions
diff --git a/‎docs/my-website/docs/rag_ingest.md‎
Lines changed: 77 additions & 1 deletion b/‎docs/my-website/docs/rag_ingest.md‎
Lines changed: 77 additions & 1 deletion
diff --git a/‎docs/my-website/docs/routing.md‎
Lines changed: 6 additions & 1 deletion b/‎docs/my-website/docs/routing.md‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎docs/my-website/docs/traffic_mirroring.md‎
Lines changed: 83 additions & 0 deletions b/‎docs/my-website/docs/traffic_mirroring.md‎
Lines changed: 83 additions & 0 deletions
@@ -38,6 +38,10 @@ spec:
       serviceAccountName: {{ include "litellm.serviceAccountName" . }}
       securityContext:
         {{- toYaml .Values.podSecurityContext | nindent 8 }}
+      {{- with .Values.extraInitContainers }}
+      initContainers:
+        {{- toYaml . | nindent 8 }}
+      {{- end }}
       containers:
         - name: {{ include "litellm.name" . }}
           securityContext:
 
@@ -35,6 +35,10 @@ spec:
         {{- toYaml . | nindent 8 }}
       {{- end }}
       serviceAccountName: {{ include "litellm.serviceAccountName" . }}
+      {{- with .Values.migrationJob.extraInitContainers }}
+      initContainers:
+        {{- toYaml . | nindent 8 }}
+      {{- end }} 
       containers:
         - name: prisma-migrations
           image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default (printf "main-%s" .Chart.AppVersion) }}"
 
@@ -281,6 +281,7 @@ migrationJob:
   #    cpu: 100m
   #    memory: 100Mi
   extraContainers: []
+  extraInitContainers: []
 
   # Hook configuration
   hooks:
 
@@ -1840,6 +1840,57 @@ content = response.get('choices', [{}])[0].get('message', {}).get('content')
 print(content)
 ```
 
+## gemini-robotics-er-1.5-preview Usage
+
+```python
+from litellm import api_base
+from openai import OpenAI
+import os
+import base64
+
+client = OpenAI(base_url="http://0.0.0.0:4000", api_key="sk-12345")
+base64_image = base64.b64encode(open("closeup-object-on-table-many-260nw-1216144471.webp", "rb").read()).decode()
+
+import json
+import re
+tools = [{"codeExecution": {}}] 
+response = client.chat.completions.create(
+    model="gemini/gemini-robotics-er-1.5-preview",
+    messages=[
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "Point to no more than 10 items in the image. The label returned should be an identifying name for the object detected. The answer should follow the json format: [{\"point\": [y, x], \"label\": <label1>}, ...]. The points are in [y, x] format normalized to 0-1000."
+                },
+                {
+                    "type": "image_url",
+                    "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}
+                }
+            ]
+        }
+    ],
+    tools=tools
+)
+
+# Extract JSON from markdown code block if present
+content = response.choices[0].message.content
+# Look for triple-backtick JSON block
+match = re.search(r'```json\s*(.*?)\s*```', content, re.DOTALL)
+if match:
+    json_str = match.group(1)
+else:
+    json_str = content
+
+try:
+    data = json.loads(json_str)
+    print(json.dumps(data, indent=2))
+except Exception as e:
+    print("Error parsing response as JSON:", e)
+    print("Response content:", content)
+```
+
 ## Usage - PDF / Videos / etc. Files
 
 ### Inline Data (e.g. audio stream)
 
@@ -0,0 +1,89 @@
+# Sarvam.ai
+
+LiteLLM supports all the text models from [Sarvam ai](https://docs.sarvam.ai/api-reference-docs/chat/chat-completions)
+
+## Usage
+
+```python
+import os
+from litellm import completion
+
+# Set your Sarvam API key
+os.environ["SARVAM_API_KEY"] = ""
+
+messages = [{"role": "user", "content": "Hello"}]
+
+response = completion(
+    model="sarvam/sarvam-m",
+    messages=messages,
+)
+print(response)
+```
+
+## Usage with LiteLLM Proxy Server
+
+Here's how to call a Sarvam.ai model with the LiteLLM Proxy Server
+
+1. **Modify the `config.yaml`:**
+
+    ```yaml
+    model_list:
+      - model_name: my-model
+        litellm_params:
+          model: sarvam/<your-model-name>  # add sarvam/ prefix to route as Sarvam provider
+          api_key: api-key                 # api key to send your model
+    ```
+
+2. **Start the proxy:**
+
+    ```bash
+    $ litellm --config /path/to/config.yaml
+    ```
+
+3. **Send a request to LiteLLM Proxy Server:**
+
+    <Tabs>
+
+    <TabItem value="openai" label="OpenAI Python v1.0.0+">
+
+    ```python
+    import openai
+
+    client = openai.OpenAI(
+        api_key="sk-1234",             # pass litellm proxy key, if you're using virtual keys
+        base_url="http://0.0.0.0:4000" # litellm-proxy-base url
+    )
+
+    response = client.chat.completions.create(
+        model="my-model",
+        messages=[
+            {
+                "role": "user",
+                "content": "what llm are you"
+            }
+        ],
+    )
+
+    print(response)
+    ```
+    </TabItem>
+
+    <TabItem value="curl" label="curl">
+
+    ```shell
+    curl --location 'http://0.0.0.0:4000/chat/completions' \
+        --header 'Authorization: Bearer sk-1234' \
+        --header 'Content-Type: application/json' \
+        --data '{
+        "model": "my-model",
+        "messages": [
+            {
+            "role": "user",
+            "content": "what llm are you"
+            }
+        ]
+    }'
+    ```
+    </TabItem>
+
+    </Tabs>
@@ -405,14 +405,10 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \
 
 ## **Proxy Admin Controls**
 
-### ✨ Monitoring Guardrails
+### Monitoring Guardrails
 
 Monitor which guardrails were executed and whether they passed or failed. e.g. guardrail going rogue and failing requests we don't intend to fail
 
-:::info
-
-✨ This is an Enterprise only feature [Get a free trial](https://www.litellm.ai/enterprise#trial)
-
 :::
 
 #### Setup
 
@@ -5,7 +5,7 @@ All-in-one document ingestion pipeline: **Upload → Chunk → Embed → Vector
 | Feature | Supported |
 |---------|-----------|
 | Logging | Yes |
-| Supported Providers | `openai`, `bedrock`, `vertex_ai`, `gemini` |
+| Supported Providers | `openai`, `bedrock`, `vertex_ai`, `gemini`, `s3_vectors` |
 
 :::tip
 After ingesting documents, use [/rag/query](./rag_query.md) to search and generate responses with your ingested content.
@@ -75,6 +75,31 @@ curl -X POST "http://localhost:4000/v1/rag/ingest" \
     }"
 ```
 
+### AWS S3 Vectors
+
+```bash showLineNumbers title="Ingest to S3 Vectors"
+curl -X POST "http://localhost:4000/v1/rag/ingest" \
+    -H "Authorization: Bearer sk-1234" \
+    -H "Content-Type: application/json" \
+    -d "{
+        \"file\": {
+            \"filename\": \"document.txt\",
+            \"content\": \"$(base64 -i document.txt)\",
+            \"content_type\": \"text/plain\"
+        },
+        \"ingest_options\": {
+            \"embedding\": {
+                \"model\": \"text-embedding-3-small\"
+            },
+            \"vector_store\": {
+                \"custom_llm_provider\": \"s3_vectors\",
+                \"vector_bucket_name\": \"my-embeddings\",
+                \"aws_region_name\": \"us-west-2\"
+            }
+        }
+    }"
+```
+
 ## Response
 
 ```json
@@ -265,6 +290,57 @@ When `vector_store_id` is omitted, LiteLLM automatically creates:
 4. Install: `pip install 'google-cloud-aiplatform>=1.60.0'`
 :::
 
+### vector_store (AWS S3 Vectors)
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `custom_llm_provider` | string | - | `"s3_vectors"` |
+| `vector_bucket_name` | string | **required** | S3 vector bucket name |
+| `index_name` | string | auto-create | Vector index name |
+| `dimension` | integer | auto-detect | Vector dimension (auto-detected from embedding model) |
+| `distance_metric` | string | `cosine` | Distance metric: `cosine` or `euclidean` |
+| `non_filterable_metadata_keys` | array | `["source_text"]` | Metadata keys excluded from filtering |
+| `aws_region_name` | string | `us-west-2` | AWS region |
+| `aws_access_key_id` | string | env | AWS access key |
+| `aws_secret_access_key` | string | env | AWS secret key |
+
+:::info S3 Vectors Auto-Creation
+When `index_name` is omitted, LiteLLM automatically creates:
+- S3 vector bucket (if it doesn't exist)
+- Vector index with auto-detected dimensions from your embedding model
+
+**Dimension Auto-Detection**: The vector dimension is automatically detected by making a test embedding request to your specified model. No need to manually specify dimensions!
+
+**Supported Embedding Models**: Works with any LiteLLM-supported embedding model (OpenAI, Cohere, Bedrock, Azure, etc.)
+:::
+
+**Example with auto-detection:**
+```json
+{
+  "embedding": {
+    "model": "text-embedding-3-small"  // Dimension auto-detected as 1536
+  },
+  "vector_store": {
+    "custom_llm_provider": "s3_vectors",
+    "vector_bucket_name": "my-embeddings"
+  }
+}
+```
+
+**Example with custom embedding provider:**
+```json
+{
+  "embedding": {
+    "model": "cohere/embed-english-v3.0"  // Dimension auto-detected as 1024
+  },
+  "vector_store": {
+    "custom_llm_provider": "s3_vectors",
+    "vector_bucket_name": "my-embeddings",
+    "distance_metric": "cosine"
+  }
+}
+```
+
 ## Input Examples
 
 ### File (Base64)
 
@@ -828,7 +828,12 @@ asyncio.run(router_acompletion())
 ```
 
 </TabItem>
-</Tabs>
+
+## Traffic Mirroring / Silent Experiments
+
+Traffic mirroring allows you to "mimic" production traffic to a secondary (silent) model for evaluation purposes. The silent model's response is gathered in the background and does not affect the latency or result of the primary request.
+
+[**See detailed guide on A/B Testing - Traffic Mirroring here**](./traffic_mirroring.md)
 
 ## Basic Reliability
 
 
@@ -0,0 +1,83 @@
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+# A/B Testing - Traffic Mirroring
+
+Traffic mirroring allows you to "mimic" production traffic to a secondary (silent) model for evaluation purposes. The silent model's response is gathered in the background and does not affect the latency or result of the primary request.
+
+This is useful for:
+- Testing a new model's performance on production prompts before switching.
+- Comparing costs and latency between different providers.
+- Debugging issues by mirroring traffic to a more verbose model.
+
+## Quick Start
+
+To enable traffic mirroring, add `silent_model` to the `litellm_params` of a deployment.
+
+<Tabs>
+<TabItem value="sdk" label="SDK">
+
+```python
+from litellm import Router
+
+model_list = [
+    {
+        "model_name": "gpt-3.5-turbo",
+        "litellm_params": {
+            "model": "azure/chatgpt-v-2",
+            "api_key": "...",
+            "silent_model": "gpt-4" # 👈 Mirror traffic to gpt-4
+        },
+    },
+    {
+        "model_name": "gpt-4",
+        "litellm_params": {
+            "model": "openai/gpt-4",
+            "api_key": "..."
+        },
+    }
+]
+
+router = Router(model_list=model_list)
+
+# The request to "gpt-3.5-turbo" will trigger a background call to "gpt-4"
+response = await router.acompletion(
+    model="gpt-3.5-turbo",
+    messages=[{"role": "user", "content": "How does traffic mirroring work?"}]
+)
+```
+
+</TabItem>
+<TabItem value="proxy" label="Proxy">
+
+Add `silent_model` to your `config.yaml`:
+
+```yaml
+model_list:
+  - model_name: primary-model
+    litellm_params:
+      model: azure/gpt-35-turbo
+      api_key: os.environ/AZURE_API_KEY
+      silent_model: evaluation-model # 👈 Mirror traffic here
+  - model_name: evaluation-model
+    litellm_params:
+      model: openai/gpt-4o
+      api_key: os.environ/OPENAI_API_KEY
+```
+
+</TabItem>
+</Tabs>
+
+## How it works
+1. **Request Received**: A request is made to a model group (e.g. `primary-model`).
+2. **Deployment Picked**: LiteLLM picks a deployment from the group.
+3. **Primary Call**: LiteLLM makes the call to the primary deployment.
+4. **Mirroring**: If `silent_model` is present, LiteLLM triggers a background call to that model. 
+   - For **Sync** calls: Uses a shared thread pool.
+   - For **Async** calls: Uses `asyncio.create_task`.
+5. **Isolation**: The background call uses a `deepcopy` of the original request parameters and sets `metadata["is_silent_experiment"] = True`. It also strips out logging IDs to prevent collisions in usage tracking.
+
+## Key Features
+- **Latency Isolation**: The primary request returns as soon as it's ready. The background (silent) call does not block.
+- **Unified Logging**: Background calls are processed via the Router, meaning they are automatically logged to your configured observability tools (Langfuse, S3, etc.).
+- **Evaluation**: Use the `is_silent_experiment: True` flag in your logs to filter and compare results between the primary and mirrored calls.