Skip to content

Commit 42a0d57

Browse files
authored
Merge pull request BerriAI#19910 from BerriAI/main
merge 01 27
2 parents c834d7d + 70eb732 commit 42a0d57

File tree

72 files changed

+4548
-385
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

72 files changed

+4548
-385
lines changed

deploy/charts/litellm-helm/templates/deployment.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,10 @@ spec:
3838
serviceAccountName: {{ include "litellm.serviceAccountName" . }}
3939
securityContext:
4040
{{- toYaml .Values.podSecurityContext | nindent 8 }}
41+
{{- with .Values.extraInitContainers }}
42+
initContainers:
43+
{{- toYaml . | nindent 8 }}
44+
{{- end }}
4145
containers:
4246
- name: {{ include "litellm.name" . }}
4347
securityContext:

deploy/charts/litellm-helm/templates/migrations-job.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,10 @@ spec:
3535
{{- toYaml . | nindent 8 }}
3636
{{- end }}
3737
serviceAccountName: {{ include "litellm.serviceAccountName" . }}
38+
{{- with .Values.migrationJob.extraInitContainers }}
39+
initContainers:
40+
{{- toYaml . | nindent 8 }}
41+
{{- end }}
3842
containers:
3943
- name: prisma-migrations
4044
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default (printf "main-%s" .Chart.AppVersion) }}"

deploy/charts/litellm-helm/values.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -281,6 +281,7 @@ migrationJob:
281281
# cpu: 100m
282282
# memory: 100Mi
283283
extraContainers: []
284+
extraInitContainers: []
284285

285286
# Hook configuration
286287
hooks:

docs/my-website/docs/providers/gemini.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1840,6 +1840,57 @@ content = response.get('choices', [{}])[0].get('message', {}).get('content')
18401840
print(content)
18411841
```
18421842

1843+
## gemini-robotics-er-1.5-preview Usage
1844+
1845+
```python
1846+
from litellm import api_base
1847+
from openai import OpenAI
1848+
import os
1849+
import base64
1850+
1851+
client = OpenAI(base_url="http://0.0.0.0:4000", api_key="sk-12345")
1852+
base64_image = base64.b64encode(open("closeup-object-on-table-many-260nw-1216144471.webp", "rb").read()).decode()
1853+
1854+
import json
1855+
import re
1856+
tools = [{"codeExecution": {}}]
1857+
response = client.chat.completions.create(
1858+
model="gemini/gemini-robotics-er-1.5-preview",
1859+
messages=[
1860+
{
1861+
"role": "user",
1862+
"content": [
1863+
{
1864+
"type": "text",
1865+
"text": "Point to no more than 10 items in the image. The label returned should be an identifying name for the object detected. The answer should follow the json format: [{\"point\": [y, x], \"label\": <label1>}, ...]. The points are in [y, x] format normalized to 0-1000."
1866+
},
1867+
{
1868+
"type": "image_url",
1869+
"image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}
1870+
}
1871+
]
1872+
}
1873+
],
1874+
tools=tools
1875+
)
1876+
1877+
# Extract JSON from markdown code block if present
1878+
content = response.choices[0].message.content
1879+
# Look for triple-backtick JSON block
1880+
match = re.search(r'```json\s*(.*?)\s*```', content, re.DOTALL)
1881+
if match:
1882+
json_str = match.group(1)
1883+
else:
1884+
json_str = content
1885+
1886+
try:
1887+
data = json.loads(json_str)
1888+
print(json.dumps(data, indent=2))
1889+
except Exception as e:
1890+
print("Error parsing response as JSON:", e)
1891+
print("Response content:", content)
1892+
```
1893+
18431894
## Usage - PDF / Videos / etc. Files
18441895

18451896
### Inline Data (e.g. audio stream)
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Sarvam.ai
2+
3+
LiteLLM supports all the text models from [Sarvam ai](https://docs.sarvam.ai/api-reference-docs/chat/chat-completions)
4+
5+
## Usage
6+
7+
```python
8+
import os
9+
from litellm import completion
10+
11+
# Set your Sarvam API key
12+
os.environ["SARVAM_API_KEY"] = ""
13+
14+
messages = [{"role": "user", "content": "Hello"}]
15+
16+
response = completion(
17+
model="sarvam/sarvam-m",
18+
messages=messages,
19+
)
20+
print(response)
21+
```
22+
23+
## Usage with LiteLLM Proxy Server
24+
25+
Here's how to call a Sarvam.ai model with the LiteLLM Proxy Server
26+
27+
1. **Modify the `config.yaml`:**
28+
29+
```yaml
30+
model_list:
31+
- model_name: my-model
32+
litellm_params:
33+
model: sarvam/<your-model-name> # add sarvam/ prefix to route as Sarvam provider
34+
api_key: api-key # api key to send your model
35+
```
36+
37+
2. **Start the proxy:**
38+
39+
```bash
40+
$ litellm --config /path/to/config.yaml
41+
```
42+
43+
3. **Send a request to LiteLLM Proxy Server:**
44+
45+
<Tabs>
46+
47+
<TabItem value="openai" label="OpenAI Python v1.0.0+">
48+
49+
```python
50+
import openai
51+
52+
client = openai.OpenAI(
53+
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
54+
base_url="http://0.0.0.0:4000" # litellm-proxy-base url
55+
)
56+
57+
response = client.chat.completions.create(
58+
model="my-model",
59+
messages=[
60+
{
61+
"role": "user",
62+
"content": "what llm are you"
63+
}
64+
],
65+
)
66+
67+
print(response)
68+
```
69+
</TabItem>
70+
71+
<TabItem value="curl" label="curl">
72+
73+
```shell
74+
curl --location 'http://0.0.0.0:4000/chat/completions' \
75+
--header 'Authorization: Bearer sk-1234' \
76+
--header 'Content-Type: application/json' \
77+
--data '{
78+
"model": "my-model",
79+
"messages": [
80+
{
81+
"role": "user",
82+
"content": "what llm are you"
83+
}
84+
]
85+
}'
86+
```
87+
</TabItem>
88+
89+
</Tabs>

docs/my-website/docs/proxy/guardrails/quick_start.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -405,14 +405,10 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \
405405

406406
## **Proxy Admin Controls**
407407

408-
### Monitoring Guardrails
408+
### Monitoring Guardrails
409409

410410
Monitor which guardrails were executed and whether they passed or failed. e.g. guardrail going rogue and failing requests we don't intend to fail
411411

412-
:::info
413-
414-
✨ This is an Enterprise only feature [Get a free trial](https://www.litellm.ai/enterprise#trial)
415-
416412
:::
417413

418414
#### Setup

docs/my-website/docs/rag_ingest.md

Lines changed: 77 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ All-in-one document ingestion pipeline: **Upload → Chunk → Embed → Vector
55
| Feature | Supported |
66
|---------|-----------|
77
| Logging | Yes |
8-
| Supported Providers | `openai`, `bedrock`, `vertex_ai`, `gemini` |
8+
| Supported Providers | `openai`, `bedrock`, `vertex_ai`, `gemini`, `s3_vectors` |
99

1010
:::tip
1111
After ingesting documents, use [/rag/query](./rag_query.md) to search and generate responses with your ingested content.
@@ -75,6 +75,31 @@ curl -X POST "http://localhost:4000/v1/rag/ingest" \
7575
}"
7676
```
7777

78+
### AWS S3 Vectors
79+
80+
```bash showLineNumbers title="Ingest to S3 Vectors"
81+
curl -X POST "http://localhost:4000/v1/rag/ingest" \
82+
-H "Authorization: Bearer sk-1234" \
83+
-H "Content-Type: application/json" \
84+
-d "{
85+
\"file\": {
86+
\"filename\": \"document.txt\",
87+
\"content\": \"$(base64 -i document.txt)\",
88+
\"content_type\": \"text/plain\"
89+
},
90+
\"ingest_options\": {
91+
\"embedding\": {
92+
\"model\": \"text-embedding-3-small\"
93+
},
94+
\"vector_store\": {
95+
\"custom_llm_provider\": \"s3_vectors\",
96+
\"vector_bucket_name\": \"my-embeddings\",
97+
\"aws_region_name\": \"us-west-2\"
98+
}
99+
}
100+
}"
101+
```
102+
78103
## Response
79104

80105
```json
@@ -265,6 +290,57 @@ When `vector_store_id` is omitted, LiteLLM automatically creates:
265290
4. Install: `pip install 'google-cloud-aiplatform>=1.60.0'`
266291
:::
267292

293+
### vector_store (AWS S3 Vectors)
294+
295+
| Parameter | Type | Default | Description |
296+
|-----------|------|---------|-------------|
297+
| `custom_llm_provider` | string | - | `"s3_vectors"` |
298+
| `vector_bucket_name` | string | **required** | S3 vector bucket name |
299+
| `index_name` | string | auto-create | Vector index name |
300+
| `dimension` | integer | auto-detect | Vector dimension (auto-detected from embedding model) |
301+
| `distance_metric` | string | `cosine` | Distance metric: `cosine` or `euclidean` |
302+
| `non_filterable_metadata_keys` | array | `["source_text"]` | Metadata keys excluded from filtering |
303+
| `aws_region_name` | string | `us-west-2` | AWS region |
304+
| `aws_access_key_id` | string | env | AWS access key |
305+
| `aws_secret_access_key` | string | env | AWS secret key |
306+
307+
:::info S3 Vectors Auto-Creation
308+
When `index_name` is omitted, LiteLLM automatically creates:
309+
- S3 vector bucket (if it doesn't exist)
310+
- Vector index with auto-detected dimensions from your embedding model
311+
312+
**Dimension Auto-Detection**: The vector dimension is automatically detected by making a test embedding request to your specified model. No need to manually specify dimensions!
313+
314+
**Supported Embedding Models**: Works with any LiteLLM-supported embedding model (OpenAI, Cohere, Bedrock, Azure, etc.)
315+
:::
316+
317+
**Example with auto-detection:**
318+
```json
319+
{
320+
"embedding": {
321+
"model": "text-embedding-3-small" // Dimension auto-detected as 1536
322+
},
323+
"vector_store": {
324+
"custom_llm_provider": "s3_vectors",
325+
"vector_bucket_name": "my-embeddings"
326+
}
327+
}
328+
```
329+
330+
**Example with custom embedding provider:**
331+
```json
332+
{
333+
"embedding": {
334+
"model": "cohere/embed-english-v3.0" // Dimension auto-detected as 1024
335+
},
336+
"vector_store": {
337+
"custom_llm_provider": "s3_vectors",
338+
"vector_bucket_name": "my-embeddings",
339+
"distance_metric": "cosine"
340+
}
341+
}
342+
```
343+
268344
## Input Examples
269345

270346
### File (Base64)

docs/my-website/docs/routing.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -828,7 +828,12 @@ asyncio.run(router_acompletion())
828828
```
829829

830830
</TabItem>
831-
</Tabs>
831+
832+
## Traffic Mirroring / Silent Experiments
833+
834+
Traffic mirroring allows you to "mimic" production traffic to a secondary (silent) model for evaluation purposes. The silent model's response is gathered in the background and does not affect the latency or result of the primary request.
835+
836+
[**See detailed guide on A/B Testing - Traffic Mirroring here**](./traffic_mirroring.md)
832837

833838
## Basic Reliability
834839

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
import Tabs from '@theme/Tabs';
2+
import TabItem from '@theme/TabItem';
3+
4+
# A/B Testing - Traffic Mirroring
5+
6+
Traffic mirroring allows you to "mimic" production traffic to a secondary (silent) model for evaluation purposes. The silent model's response is gathered in the background and does not affect the latency or result of the primary request.
7+
8+
This is useful for:
9+
- Testing a new model's performance on production prompts before switching.
10+
- Comparing costs and latency between different providers.
11+
- Debugging issues by mirroring traffic to a more verbose model.
12+
13+
## Quick Start
14+
15+
To enable traffic mirroring, add `silent_model` to the `litellm_params` of a deployment.
16+
17+
<Tabs>
18+
<TabItem value="sdk" label="SDK">
19+
20+
```python
21+
from litellm import Router
22+
23+
model_list = [
24+
{
25+
"model_name": "gpt-3.5-turbo",
26+
"litellm_params": {
27+
"model": "azure/chatgpt-v-2",
28+
"api_key": "...",
29+
"silent_model": "gpt-4" # 👈 Mirror traffic to gpt-4
30+
},
31+
},
32+
{
33+
"model_name": "gpt-4",
34+
"litellm_params": {
35+
"model": "openai/gpt-4",
36+
"api_key": "..."
37+
},
38+
}
39+
]
40+
41+
router = Router(model_list=model_list)
42+
43+
# The request to "gpt-3.5-turbo" will trigger a background call to "gpt-4"
44+
response = await router.acompletion(
45+
model="gpt-3.5-turbo",
46+
messages=[{"role": "user", "content": "How does traffic mirroring work?"}]
47+
)
48+
```
49+
50+
</TabItem>
51+
<TabItem value="proxy" label="Proxy">
52+
53+
Add `silent_model` to your `config.yaml`:
54+
55+
```yaml
56+
model_list:
57+
- model_name: primary-model
58+
litellm_params:
59+
model: azure/gpt-35-turbo
60+
api_key: os.environ/AZURE_API_KEY
61+
silent_model: evaluation-model # 👈 Mirror traffic here
62+
- model_name: evaluation-model
63+
litellm_params:
64+
model: openai/gpt-4o
65+
api_key: os.environ/OPENAI_API_KEY
66+
```
67+
68+
</TabItem>
69+
</Tabs>
70+
71+
## How it works
72+
1. **Request Received**: A request is made to a model group (e.g. `primary-model`).
73+
2. **Deployment Picked**: LiteLLM picks a deployment from the group.
74+
3. **Primary Call**: LiteLLM makes the call to the primary deployment.
75+
4. **Mirroring**: If `silent_model` is present, LiteLLM triggers a background call to that model.
76+
- For **Sync** calls: Uses a shared thread pool.
77+
- For **Async** calls: Uses `asyncio.create_task`.
78+
5. **Isolation**: The background call uses a `deepcopy` of the original request parameters and sets `metadata["is_silent_experiment"] = True`. It also strips out logging IDs to prevent collisions in usage tracking.
79+
80+
## Key Features
81+
- **Latency Isolation**: The primary request returns as soon as it's ready. The background (silent) call does not block.
82+
- **Unified Logging**: Background calls are processed via the Router, meaning they are automatically logged to your configured observability tools (Langfuse, S3, etc.).
83+
- **Evaluation**: Use the `is_silent_experiment: True` flag in your logs to filter and compare results between the primary and mirrored calls.

0 commit comments

Comments
 (0)