Skip to content

Commit c55e8f4

Browse files
Add Pinecone mixin (grafana#1572)
* Add Pinecone mixin: dashboard with operations metrics and resource monitoring * Update pinecone-mixin/config.libsonnet Co-authored-by: Emily <1282515+Dasomeone@users.noreply.github.com> * Update pinecone-mixin/README.md Co-authored-by: Emily <1282515+Dasomeone@users.noreply.github.com> * Update pinecone-mixin/rows.libsonnet Co-authored-by: Emily <1282515+Dasomeone@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Emily <1282515+Dasomeone@users.noreply.github.com> * Resolve dashboard lint errors - Add job template variable via groupLabels - Update filteringSelector to use the $job variable - Include instance in instanceLabels for instance variable - Set panel datasource to use template variables instead of mixed datasource * Add alerts for pinecone mixin * remove unused variables and fix import * Apply suggestions from code review Co-authored-by: Emily <1282515+Dasomeone@users.noreply.github.com> * apply code review comments - Remove value duplication from config comments - Standardize grid position assignments to { gridPos+: { h, w } } format - Add 'cloud' and 'region' to groupLabels - Update alert 'for' duration from 3m to 5m - Update README placeholders to angle bracket format * Migrate to Pinecone's new metric names from _total to _count or_sum * remove redundant prefix from alert group name and clean up comments * Apply review comments - Add table legends to operation panels for better readability - Remove pinecone_index_fullness --------- Co-authored-by: Emily <1282515+Dasomeone@users.noreply.github.com>
1 parent 250f0f4 commit c55e8f4

File tree

13 files changed

+977
-0
lines changed

13 files changed

+977
-0
lines changed

pinecone-mixin/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
include ../Makefile_mixin

pinecone-mixin/README.md

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# Pinecone Mixin
2+
3+
The Pinecone mixin is a set of configurable Grafana dashboards and alerts based on the metrics exported by [Pinecone's built-in Prometheus exporter](https://docs.pinecone.io/guides/production/monitoring#monitor-with-prometheus).
4+
5+
## Dashboards
6+
7+
- **Pinecone Overview** - Provides details on index capacity, operation metrics (upsert, query, fetch, delete), operation durations, and resource usage (read/write units).
8+
9+
## Alerts
10+
11+
| Alert | Summary |
12+
| ----- | ------- |
13+
| PineconeHighQueryLatency | Query latency exceeds baseline thresholds, indicating performance degradation in query operations. |
14+
| PineconeHighUpsertLatency | Upsert latency exceeds baseline thresholds, indicating performance degradation in upsert operations. |
15+
| PineconeHighUnitConsumption | Read or write unit consumption is increasing rapidly or nearing allocated limits. |
16+
17+
Alert thresholds can be configured in `config.libsonnet`. See the generated `prometheus_alerts.yaml` for default values.
18+
19+
## Configuration
20+
21+
This mixin is designed to work with Pinecone's built-in Prometheus exporter, which is available on Standard and Enterprise plans.
22+
23+
### Alloy
24+
25+
To monitor all serverless indexes in a project using Alloy, add the following to your Alloy configuration:
26+
27+
```alloy
28+
prometheus.scrape "pinecone" {
29+
targets = discovery.http "pinecone" {
30+
url = "https://api.pinecone.io/prometheus/projects/<your-project-ID>/metrics/discovery"
31+
refresh_interval = "1m"
32+
authorization {
33+
type = "Bearer"
34+
credentials = "<your-api-key>"
35+
}
36+
}.targets
37+
forward_to = [prometheus.remote_write.metrics.receiver]
38+
}
39+
40+
prometheus.remote_write "metrics" {
41+
endpoint {
42+
url = "<your-prometheus-remote-write-endpoint>"
43+
}
44+
}
45+
```
46+
47+
Replace `<your-project-ID>` and `<your-api-key>` with your Pinecone project ID and API key. For more details, see the [Pinecone monitoring documentation](https://docs.pinecone.io/guides/production/monitoring#monitor-with-prometheus).
48+
49+
**Note:** If you have more than one Pinecone project, you need to add separate scrape configurations for each project with different project IDs and targets. It is recommended to add a `project_id` label via relabeling to distinguish metrics from different projects.
50+
51+
#### Alloy (Multiple Projects)
52+
53+
```alloy
54+
discovery.http "pinecone_project_1" {
55+
url = "https://api.pinecone.io/prometheus/projects/<your-project-ID>/metrics/discovery"
56+
refresh_interval = "1m"
57+
authorization {
58+
type = "Bearer"
59+
credentials = "<your-api-key>"
60+
}
61+
}
62+
63+
prometheus.scrape "pinecone_project_1" {
64+
targets = discovery.http.pinecone_project_1.targets
65+
forward_to = [prometheus.remote_write.metrics.receiver]
66+
67+
relabel {
68+
source_labels = ["__meta_http_sd_url"]
69+
regex = ".*/projects/([^/]+)/.*"
70+
target_label = "project_id"
71+
replacement = "${1}"
72+
}
73+
}
74+
75+
discovery.http "pinecone_project_2" {
76+
url = "https://api.pinecone.io/prometheus/projects/<your-project-ID>/metrics/discovery"
77+
refresh_interval = "1m"
78+
authorization {
79+
type = "Bearer"
80+
credentials = "<your-api-key>"
81+
}
82+
}
83+
84+
prometheus.scrape "pinecone_project_2" {
85+
targets = discovery.http.pinecone_project_2.targets
86+
forward_to = [prometheus.remote_write.metrics.receiver]
87+
88+
relabel {
89+
source_labels = ["__meta_http_sd_url"]
90+
regex = ".*/projects/([^/]+)/.*"
91+
target_label = "project_id"
92+
replacement = "${1}"
93+
}
94+
}
95+
96+
prometheus.remote_write "metrics" {
97+
endpoint {
98+
url = "<your-prometheus-remote-write-endpoint>"
99+
}
100+
}
101+
```
102+
103+
## Install Tools
104+
105+
To use this mixin, a working Golang toolchain is required, alongside having `mixtool` and `jsonnetfmt` installed.
106+
To do so, run the following:
107+
108+
```bash
109+
go install github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb@latest
110+
go install github.com/monitoring-mixins/mixtool/cmd/mixtool@latest
111+
go install github.com/google/go-jsonnet/cmd/jsonnetfmt@latest
112+
```
113+
114+
## Generate Dashboards and Alerts
115+
116+
Edit `config.libsonnet` if required and then build JSON dashboard files for Grafana:
117+
118+
```bash
119+
make
120+
```
121+
122+
The files in `dashboards_out` need to be imported into your Grafana server. The `prometheus_alerts.yaml` file needs to be imported into Prometheus.
123+
124+
For more advanced uses of mixins, see [Prometheus Monitoring Mixins docs](https://github.com/monitoring-mixins/docs).
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
local xtd = import 'github.com/jsonnet-libs/xtd/main.libsonnet';
2+
3+
{
4+
new(this): {
5+
local instanceLabel = xtd.array.slice(this.config.instanceLabels, -1)[0],
6+
local firstInstanceLabel = this.config.instanceLabels[0],
7+
prometheusAlerts: {
8+
groups: [
9+
{
10+
name: this.config.uid + '-alerts',
11+
rules: [
12+
{
13+
alert: 'PineconeHighQueryLatencyWarning',
14+
expr: 'rate(pinecone_db_op_query_duration_sum{%(filteringSelector)s}[5m]) / clamp_min(rate(pinecone_db_op_query_count{%(filteringSelector)s}[5m]), 1) > (%(queryLatencySimpleWarningMs)s / 1000)' % this.config {
15+
queryLatencySimpleWarningMs: this.config.alertsQueryLatencySimpleWarningMs,
16+
},
17+
'for': '5m',
18+
keep_firing_for: '5m',
19+
labels: {
20+
severity: 'warning',
21+
},
22+
annotations: {
23+
summary: 'Query latency exceeds warning thresholds, indicating performance degradation in query operations.',
24+
description: 'Query latency on {{ $labels.%s }} (index: {{ $labels.%s }}) is {{ printf "%%.3f" $value }}s. This exceeds the warning threshold: > %sms.' % [
25+
firstInstanceLabel,
26+
instanceLabel,
27+
this.config.alertsQueryLatencySimpleWarningMs,
28+
],
29+
},
30+
},
31+
{
32+
alert: 'PineconeHighQueryLatencyCritical',
33+
expr: 'rate(pinecone_db_op_query_duration_sum{%(filteringSelector)s}[5m]) / clamp_min(rate(pinecone_db_op_query_count{%(filteringSelector)s}[5m]), 1) > (%(queryLatencySimpleCriticalMs)s / 1000)' % this.config {
34+
queryLatencySimpleCriticalMs: this.config.alertsQueryLatencySimpleCriticalMs,
35+
},
36+
'for': '5m',
37+
keep_firing_for: '5m',
38+
labels: {
39+
severity: 'critical',
40+
},
41+
annotations: {
42+
summary: 'Query latency exceeds critical thresholds, indicating performance degradation in query operations.',
43+
description: 'Query latency on {{ $labels.%s }} (index: {{ $labels.%s }}) is {{ printf "%%.3f" $value }}s. CRITICAL: This exceeds the critical threshold: > %sms.' % [
44+
firstInstanceLabel,
45+
instanceLabel,
46+
this.config.alertsQueryLatencySimpleCriticalMs,
47+
],
48+
},
49+
},
50+
{
51+
alert: 'PineconeHighUpsertLatencyWarning',
52+
expr: 'rate(pinecone_db_op_upsert_duration_sum{%(filteringSelector)s}[15m]) / clamp_min(rate(pinecone_db_op_upsert_count{%(filteringSelector)s}[15m]), 1) > (%(upsertLatencyWarningMs)s / 1000)' % this.config {
53+
upsertLatencyWarningMs: this.config.alertsUpsertLatencyWarningMs,
54+
},
55+
'for': '5m',
56+
keep_firing_for: '5m',
57+
labels: {
58+
severity: 'warning',
59+
},
60+
annotations: {
61+
summary: 'Upsert latency exceeds warning thresholds, indicating performance degradation in upsert operations.',
62+
description: 'Upsert latency on {{ $labels.%s }} (index: {{ $labels.%s }}) is {{ printf "%%.3f" $value }}s. This exceeds the warning threshold: > %sms sustained.' % [
63+
firstInstanceLabel,
64+
instanceLabel,
65+
this.config.alertsUpsertLatencyWarningMs,
66+
],
67+
},
68+
},
69+
{
70+
alert: 'PineconeHighUpsertLatencyCritical',
71+
expr: 'rate(pinecone_db_op_upsert_duration_sum{%(filteringSelector)s}[15m]) / clamp_min(rate(pinecone_db_op_upsert_count{%(filteringSelector)s}[15m]), 1) > (%(upsertLatencyCriticalMs)s / 1000)' % this.config {
72+
upsertLatencyCriticalMs: this.config.alertsUpsertLatencyCriticalMs,
73+
},
74+
'for': '5m',
75+
keep_firing_for: '5m',
76+
labels: {
77+
severity: 'critical',
78+
},
79+
annotations: {
80+
summary: 'Upsert latency exceeds critical thresholds, indicating performance degradation in upsert operations.',
81+
description: 'Upsert latency on {{ $labels.%s }} (index: {{ $labels.%s }}) is {{ printf "%%.3f" $value }}s. CRITICAL: This exceeds the critical threshold: > %sms sustained.' % [
82+
firstInstanceLabel,
83+
instanceLabel,
84+
this.config.alertsUpsertLatencyCriticalMs,
85+
],
86+
},
87+
},
88+
{
89+
alert: 'PineconeUnitBurnDownWarning',
90+
expr: |||
91+
(
92+
rate(pinecone_db_read_unit_count{%(filteringSelector)s}[30m])
93+
/ clamp_min(rate(pinecone_db_read_unit_count{%(filteringSelector)s}[30m] offset 30m), 1)
94+
> (%(unitBurnDownBaselineIncreaseWarning)s / 100)
95+
)
96+
OR
97+
(
98+
rate(pinecone_db_write_unit_total{%(filteringSelector)s}[30m])
99+
/ clamp_min(rate(pinecone_db_write_unit_total{%(filteringSelector)s}[30m] offset 30m), 1)
100+
> (%(unitBurnDownBaselineIncreaseWarning)s / 100)
101+
)
102+
OR
103+
(
104+
increase(pinecone_db_read_unit_count{%(filteringSelector)s}[1h]) > 0
105+
AND
106+
100 * 24 * increase(pinecone_db_read_unit_count{%(filteringSelector)s}[1h]) / clamp_min(pinecone_db_read_unit_budget{%(filteringSelector)s}, 1) > %(unitBurnDownBudgetUsageWarning)s
107+
)
108+
OR
109+
(
110+
increase(pinecone_db_write_unit_total{%(filteringSelector)s}[1h]) > 0
111+
AND
112+
100 * 24 * increase(pinecone_db_write_unit_total{%(filteringSelector)s}[1h]) / clamp_min(pinecone_db_write_unit_budget{%(filteringSelector)s}, 1) > %(unitBurnDownBudgetUsageWarning)s
113+
)
114+
||| % this.config {
115+
unitBurnDownBaselineIncreaseWarning: this.config.alertsUnitBurnDownBaselineIncreaseWarning,
116+
unitBurnDownBudgetUsageWarning: this.config.alertsUnitBurnDownBudgetUsageWarning,
117+
},
118+
'for': '5m',
119+
keep_firing_for: '10m',
120+
labels: {
121+
severity: 'warning',
122+
},
123+
annotations: {
124+
summary: 'RU/WU usage increasing rapidly or nearing allocated limits, causing potential throttling or cost spikes.',
125+
description: 'Unit consumption on {{ $labels.%s }} (index: {{ $labels.%s }}) is high. This exceeds the warning threshold: either RU or WU rate > %s%% above 30-minute baseline or sustained usage over 1h > %s%% of allocated budget.' % [
126+
firstInstanceLabel,
127+
instanceLabel,
128+
(this.config.alertsUnitBurnDownBaselineIncreaseWarning - 100),
129+
this.config.alertsUnitBurnDownBudgetUsageWarning,
130+
],
131+
},
132+
},
133+
],
134+
},
135+
],
136+
},
137+
},
138+
}

pinecone-mixin/config.libsonnet

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
{
2+
local this = self,
3+
dashboardTags: ['pinecone'],
4+
dashboardPeriod: 'now-1h',
5+
dashboardTimezone: 'default',
6+
dashboardRefresh: '1m',
7+
dashboardNamePrefix: 'Pinecone',
8+
uid: 'pinecone',
9+
10+
// Filtering and labels
11+
filteringSelector: '',
12+
groupLabels: ['cloud', 'region', 'job'],
13+
instanceLabels: ['instance', 'index_name'],
14+
15+
// Metrics source
16+
metricsSource: ['prometheus'],
17+
18+
// Signals
19+
signals: {
20+
operations: (import './signals/operations.libsonnet')(this),
21+
overview: (import './signals/overview.libsonnet')(this),
22+
},
23+
24+
// Feature flags
25+
enableLokiLogs: false,
26+
27+
// Alert thresholds
28+
alertsQueryLatencySimpleWarningMs: 125, // Suggested range: 100-125ms for simple queries
29+
alertsQueryLatencySimpleCriticalMs: 300, // Suggested range: 200-300ms for simple queries
30+
31+
alertsUpsertLatencyWarningMs: 250, // Sustained latency threshold
32+
alertsUpsertLatencyCriticalMs: 500, // Sustained latency threshold
33+
34+
// Unit burn-down alert thresholds
35+
// Baseline increase thresholds as integer percentages (130 = 130% of baseline = 30% increase)
36+
alertsUnitBurnDownBaselineIncreaseWarning: 130, // Percentage above 30-min baseline
37+
alertsUnitBurnDownBudgetUsageWarning: 80, // Percentage of allocated budget
38+
}
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
local g = import './g.libsonnet';
2+
3+
{
4+
new(this):
5+
{
6+
'pinecone-overview.json':
7+
g.dashboard.new(this.config.dashboardNamePrefix + ' overview')
8+
+ g.dashboard.withUid(this.config.uid + '-overview')
9+
+ g.dashboard.withTags(this.config.dashboardTags)
10+
+ g.dashboard.withTimezone(this.config.dashboardTimezone)
11+
+ g.dashboard.withRefresh(this.config.dashboardRefresh)
12+
+ g.dashboard.time.withFrom(this.config.dashboardPeriod)
13+
+ g.dashboard.withVariables(
14+
this.signals.operations.getVariablesMultiChoice()
15+
)
16+
+ g.dashboard.withPanels(
17+
g.util.grid.wrapPanels(
18+
this.grafana.rows.overview
19+
+ this.grafana.rows.writeOperations
20+
+ this.grafana.rows.readOperations
21+
+ this.grafana.rows.resourceUsage
22+
)
23+
),
24+
},
25+
}

pinecone-mixin/g.libsonnet

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
import 'github.com/grafana/grafonnet/gen/grafonnet-v11.4.0/main.libsonnet'

pinecone-mixin/jsonnetfile.json

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
{
2+
"version": 1,
3+
"dependencies": [
4+
{
5+
"source": {
6+
"git": {
7+
"remote": "https://github.com/grafana/grafonnet.git",
8+
"subdir": "gen/grafonnet-v11.4.0"
9+
}
10+
},
11+
"version": "main"
12+
},
13+
{
14+
"source": {
15+
"git": {
16+
"remote": "https://github.com/grafana/jsonnet-libs.git",
17+
"subdir": "common-lib"
18+
}
19+
},
20+
"version": "master"
21+
}
22+
],
23+
"legacyImports": true
24+
}
25+
26+

0 commit comments

Comments
 (0)