Skip to content

Add support for Vector Search Endpoint (direct only)#4887

Merged
janniklasrose merged 49 commits into
mainfrom
janniklasrose/vs-endpoint
Apr 20, 2026
Merged

Add support for Vector Search Endpoint (direct only)#4887
janniklasrose merged 49 commits into
mainfrom
janniklasrose/vs-endpoint

Conversation

@janniklasrose
Copy link
Copy Markdown
Contributor

@janniklasrose janniklasrose commented Apr 2, 2026

Changes

Adds vector_search_endpoints as a first-class resource type, using the direct deployment engine (only, no TF support).

New configuration surface

resources:
  vector_search_endpoints:
    my_endpoint:
      name: my-endpoint
      endpoint_type: STANDARD
      min_qps: 1
      budget_policy_id: my-policy
      permissions:
        - level: CAN_USE
          group_name: data-team

Required fields: name, endpoint_type. Optional: min_qps, budget_policy_id, permissions.

Key points to note

State ID = endpoint name. The CRUD API identifies endpoints by name; the UUID
(endpoint_uuid) is stored separately in the refresh output for use by the permissions API.

endpoint_type is immutable. Changing it triggers delete + recreate (resources.yml).

Two separate update APIs. DoUpdate dispatches to:

  • UpdateEndpointBudgetPolicy when budget_policy_id changes
  • PatchEndpoint when min_qps changes

These can fire in the same deploy if both fields change.

budget_policy_id drift is suppressed. The API returns effective_budget_policy_id
(which includes inherited workspace policies), not the user-set value. Until the SDK
exposes budget_policy_id separately, remote changes to this field are ignored
(reason: effective_vs_requested in resources.yml). See TODO in
bundle/direct/dresources/vector_search_endpoint.go:53.

Permissions use UUID, not name. The PreparePermissionsInputConfig function uses
${...endpoint_uuid} as the object ID when constructing the permissions API path for
vector search endpoints.

Direct-only validation. ValidateDirectOnlyResources (bundle/config/mutator/) emits
an error at plan/deploy time if vector_search_endpoints are present in a non-direct bundle.
Vector Search Endpoints have no Terraform provider.

No dev-mode name prefix. Like UC resources, vector search endpoint names are NOT
prefixed with the dev user name in development mode.

Tests

  • Acceptance & Unit tests.
  • Tested e2e with CLI build.

Comment thread acceptance/bundle/invariant/configs/vector_search_endpoint.yml.tmpl
Comment thread acceptance/bundle/resources/vector_search_endpoints/drift/min_qps/script Outdated
Comment thread bundle/direct/dresources/permissions.go
Comment thread Makefile Outdated
Comment thread bundle/schema/jsonschema_for_docs.json
Comment thread bundle/internal/validation/generated/enum_fields.go Outdated
Comment thread bundle/direct/dresources/vector_search_endpoint.go Outdated
Comment thread bundle/direct/dresources/vector_search_endpoint.go Outdated
Comment thread bundle/direct/dresources/vector_search_endpoint.go Outdated
@janniklasrose janniklasrose added this pull request to the merge queue Apr 20, 2026
Merged via the queue into main with commit 5202ec2 Apr 20, 2026
24 checks passed
@janniklasrose janniklasrose deleted the janniklasrose/vs-endpoint branch April 20, 2026 09:50
bernardo-rodriguez pushed a commit to bernardo-rodriguez/b-cli that referenced this pull request Apr 21, 2026
…s#5046)

## Changes

Address review nits on databricks#4887:

- **`update/min_qps/script`**: drop redundant `--keep` + manual `rm`
pair in `print_requests()`. `print_requests.py` already deletes
`out.requests.txt` when `--keep` is omitted, so the pair was a no-op.
([thread](databricks#4887 (comment)))
- **`drift/min_qps/script`**: record `bundle plan --output json`
alongside the existing `contains.py` summary check, so the test pins
down that `min_qps` is the *only* field detected as changed (old=1,
new=1, remote=5), not just the overall count.
([thread](databricks#4887 (comment)))

Not in this PR: the [`recreated_same_name` badness
thread](databricks#4887 (comment))
— that requires real behavior change (storing `endpoint_uuid` in state
and comparing it via `OverrideChangeDesc`, similar to `dashboards.go`'s
etag pattern), so it'll get its own follow-up PR.

## Tests

- `go test ./acceptance -run
TestAccept/bundle/resources/vector_search_endpoints/update/min_qps`
- `go test ./acceptance -run
TestAccept/bundle/resources/vector_search_endpoints/drift/min_qps`
deco-sdk-tagging Bot added a commit that referenced this pull request Apr 22, 2026
## Release v0.298.0

### CLI
* Added `--limit` flag to all paginated list commands for client-side result capping ([#4984](#4984)). On `jobs list` and `jobs list-runs` the former API page-size flag was renamed to `--page-size` (hidden) to avoid collision.
* Accept `yes` in addition to `y` for confirmation prompts, and show `[y/N]` to indicate that no is the default.
* Cache `/.well-known/databricks-config` lookups under `~/.cache/databricks/<version>/host-metadata/` so repeat CLI invocations against the same host skip the ~700ms discovery round trip.
* Deprecated `auth env`. The command is hidden from help listings and prints a deprecation warning to stderr; it will be removed in a future release.

### Bundles
* Remove `experimental-jobs-as-code` template, superseded by `pydabs` ([#4999](#4999)).
* Prompt before destroying or recreating Lakebase resources (database instances, synced database tables, postgres projects and branches) ([#5052](#5052)).
* Treat deleted resources as not running in the `fail-on-active-runs` check ([#5044](#5044)).
* engine/direct: Added support for Vector Search Endpoints ([#4887](#4887)).
* engine/direct: Exclude deploy-only fields (e.g. `lifecycle`) from the Apps update mask so requests that change both `description` and `lifecycle.started` in the same deploy no longer fail with `INVALID_PARAMETER_VALUE` ([#5042](#5042), [#5051](#5051)).
* engine/direct: Fix phantom diffs from `depends_on` reordering in job tasks ([#4990](#4990)).

### Dependency updates
* Bump `github.com/databricks/databricks-sdk-go` from v0.126.0 to v0.128.0 ([#4984](#4984), [#5031](#5031)).
* Bump Go toolchain to 1.25.9 ([#5004](#5004)).
shreyas-goenka added a commit that referenced this pull request May 3, 2026
…ting principal (#5151)

## Summary

The invariant test config used \`user_name: viewer@example.com\`, which
doesn't exist in the cloud workspaces. The Permissions Set API silently
drops the unknown user, so a Read after deploy returns an ACL without
that entry — the no_drift invariant then sees a phantom update and the
test fails on aws-prod-ucws.

Pre-existing bug from #4887, not caught earlier because deploy itself
was failing on the 50-char endpoint name limit (#5108) before reaching
the no_drift check.

### Failure shape (before this fix)

\`\`\`
"resources.vector_search_endpoints.bar.permissions": {
  "action": "update",
  "new_state": {
    "value": {
      "__embed__": [
        { "level": "CAN_USE", "user_name": "viewer@example.com" },
{ "level": "CAN_MANAGE", "service_principal_name": "[USERNAME]" }
      ]
    }
  },
  "remote_state": {
    "__embed__": [
      { "level": "CAN_MANAGE", "service_principal_name": "[USERNAME]" }
    ]
  },
  ...
}
\`\`\`

### Change

Use \`group_name: users\` (always present in every workspace) to match
the pattern used by the other \`*_with_permissions\` invariant configs
(\`job_with_permissions\`, \`model_with_permissions\`,
\`secret_scope_with_permissions\`).

## Test plan

- [x] Local: \`go test ./acceptance -run
'TestAccept/bundle/invariant/no_drift/DATABRICKS_BUNDLE_ENGINE=direct/INPUT_CONFIG=vector_search_endpoint'\`
passes
- [x] Cloud: same target passes on aws-prod-ucws

This pull request was AI-assisted by Isaac.
denik added a commit that referenced this pull request May 20, 2026
## Changes

Adds `vector_search_endpoints` as a first-class resource type, using the
direct deployment engine (only, no TF support).

### New configuration surface

```yaml
resources:
  vector_search_endpoints:
    my_endpoint:
      name: my-endpoint
      endpoint_type: STANDARD
      min_qps: 1
      budget_policy_id: my-policy
      permissions:
        - level: CAN_USE
          group_name: data-team
```

Required fields: `name`, `endpoint_type`. Optional: `min_qps`,
`budget_policy_id`, `permissions`.

## Key points to note

**State ID = endpoint name.** The CRUD API identifies endpoints by name;
the UUID
(`endpoint_uuid`) is stored separately in the refresh output for use by
the permissions API.

**`endpoint_type` is immutable.** Changing it triggers delete + recreate
(`resources.yml`).

**Two separate update APIs.** `DoUpdate` dispatches to:
- `UpdateEndpointBudgetPolicy` when `budget_policy_id` changes
- `PatchEndpoint` when `min_qps` changes

These can fire in the same deploy if both fields change.

**`budget_policy_id` drift is suppressed.** The API returns
`effective_budget_policy_id`
(which includes inherited workspace policies), not the user-set value.
Until the SDK
exposes `budget_policy_id` separately, remote changes to this field are
ignored
(`reason: effective_vs_requested` in `resources.yml`). See TODO in
`bundle/direct/dresources/vector_search_endpoint.go:53`.

**Permissions use UUID, not name.** The `PreparePermissionsInputConfig`
function uses
`${...endpoint_uuid}` as the object ID when constructing the permissions
API path for
vector search endpoints.

**Direct-only validation.** `ValidateDirectOnlyResources`
(`bundle/config/mutator/`) emits
an error at plan/deploy time if vector_search_endpoints are present in a
non-direct bundle.
Vector Search Endpoints have no Terraform provider.

**No dev-mode name prefix.** Like UC resources, vector search endpoint
names are NOT
prefixed with the dev user name in development mode.

## Tests
- Acceptance & Unit tests.
- Tested e2e with CLI build.

---------

Co-authored-by: Denis Bilenko <denis.bilenko@databricks.com>
denik pushed a commit that referenced this pull request May 20, 2026
## Changes

Address review nits on #4887:

- **`update/min_qps/script`**: drop redundant `--keep` + manual `rm`
pair in `print_requests()`. `print_requests.py` already deletes
`out.requests.txt` when `--keep` is omitted, so the pair was a no-op.
([thread](#4887 (comment)))
- **`drift/min_qps/script`**: record `bundle plan --output json`
alongside the existing `contains.py` summary check, so the test pins
down that `min_qps` is the *only* field detected as changed (old=1,
new=1, remote=5), not just the overall count.
([thread](#4887 (comment)))

Not in this PR: the [`recreated_same_name` badness
thread](#4887 (comment))
— that requires real behavior change (storing `endpoint_uuid` in state
and comparing it via `OverrideChangeDesc`, similar to `dashboards.go`'s
etag pattern), so it'll get its own follow-up PR.

## Tests

- `go test ./acceptance -run
TestAccept/bundle/resources/vector_search_endpoints/update/min_qps`
- `go test ./acceptance -run
TestAccept/bundle/resources/vector_search_endpoints/drift/min_qps`
denik pushed a commit that referenced this pull request May 20, 2026
## Release v0.298.0

### CLI
* Added `--limit` flag to all paginated list commands for client-side result capping ([#4984](#4984)). On `jobs list` and `jobs list-runs` the former API page-size flag was renamed to `--page-size` (hidden) to avoid collision.
* Accept `yes` in addition to `y` for confirmation prompts, and show `[y/N]` to indicate that no is the default.
* Cache `/.well-known/databricks-config` lookups under `~/.cache/databricks/<version>/host-metadata/` so repeat CLI invocations against the same host skip the ~700ms discovery round trip.
* Deprecated `auth env`. The command is hidden from help listings and prints a deprecation warning to stderr; it will be removed in a future release.

### Bundles
* Remove `experimental-jobs-as-code` template, superseded by `pydabs` ([#4999](#4999)).
* Prompt before destroying or recreating Lakebase resources (database instances, synced database tables, postgres projects and branches) ([#5052](#5052)).
* Treat deleted resources as not running in the `fail-on-active-runs` check ([#5044](#5044)).
* engine/direct: Added support for Vector Search Endpoints ([#4887](#4887)).
* engine/direct: Exclude deploy-only fields (e.g. `lifecycle`) from the Apps update mask so requests that change both `description` and `lifecycle.started` in the same deploy no longer fail with `INVALID_PARAMETER_VALUE` ([#5042](#5042), [#5051](#5051)).
* engine/direct: Fix phantom diffs from `depends_on` reordering in job tasks ([#4990](#4990)).

### Dependency updates
* Bump `github.com/databricks/databricks-sdk-go` from v0.126.0 to v0.128.0 ([#4984](#4984), [#5031](#5031)).
* Bump Go toolchain to 1.25.9 ([#5004](#5004)).
denik pushed a commit that referenced this pull request May 20, 2026
…ting principal (#5151)

## Summary

The invariant test config used \`user_name: viewer@example.com\`, which
doesn't exist in the cloud workspaces. The Permissions Set API silently
drops the unknown user, so a Read after deploy returns an ACL without
that entry — the no_drift invariant then sees a phantom update and the
test fails on aws-prod-ucws.

Pre-existing bug from #4887, not caught earlier because deploy itself
was failing on the 50-char endpoint name limit (#5108) before reaching
the no_drift check.

### Failure shape (before this fix)

\`\`\`
"resources.vector_search_endpoints.bar.permissions": {
  "action": "update",
  "new_state": {
    "value": {
      "__embed__": [
        { "level": "CAN_USE", "user_name": "viewer@example.com" },
{ "level": "CAN_MANAGE", "service_principal_name": "[USERNAME]" }
      ]
    }
  },
  "remote_state": {
    "__embed__": [
      { "level": "CAN_MANAGE", "service_principal_name": "[USERNAME]" }
    ]
  },
  ...
}
\`\`\`

### Change

Use \`group_name: users\` (always present in every workspace) to match
the pattern used by the other \`*_with_permissions\` invariant configs
(\`job_with_permissions\`, \`model_with_permissions\`,
\`secret_scope_with_permissions\`).

## Test plan

- [x] Local: \`go test ./acceptance -run
'TestAccept/bundle/invariant/no_drift/DATABRICKS_BUNDLE_ENGINE=direct/INPUT_CONFIG=vector_search_endpoint'\`
passes
- [x] Cloud: same target passes on aws-prod-ucws

This pull request was AI-assisted by Isaac.
TanishqDatabricks pushed a commit to TanishqDatabricks/cli that referenced this pull request May 22, 2026
…ting principal (databricks#5151)

## Summary

The invariant test config used \`user_name: viewer@example.com\`, which
doesn't exist in the cloud workspaces. The Permissions Set API silently
drops the unknown user, so a Read after deploy returns an ACL without
that entry — the no_drift invariant then sees a phantom update and the
test fails on aws-prod-ucws.

Pre-existing bug from databricks#4887, not caught earlier because deploy itself
was failing on the 50-char endpoint name limit (databricks#5108) before reaching
the no_drift check.

### Failure shape (before this fix)

\`\`\`
"resources.vector_search_endpoints.bar.permissions": {
  "action": "update",
  "new_state": {
    "value": {
      "__embed__": [
        { "level": "CAN_USE", "user_name": "viewer@example.com" },
{ "level": "CAN_MANAGE", "service_principal_name": "[USERNAME]" }
      ]
    }
  },
  "remote_state": {
    "__embed__": [
      { "level": "CAN_MANAGE", "service_principal_name": "[USERNAME]" }
    ]
  },
  ...
}
\`\`\`

### Change

Use \`group_name: users\` (always present in every workspace) to match
the pattern used by the other \`*_with_permissions\` invariant configs
(\`job_with_permissions\`, \`model_with_permissions\`,
\`secret_scope_with_permissions\`).

## Test plan

- [x] Local: \`go test ./acceptance -run
'TestAccept/bundle/invariant/no_drift/DATABRICKS_BUNDLE_ENGINE=direct/INPUT_CONFIG=vector_search_endpoint'\`
passes
- [x] Cloud: same target passes on aws-prod-ucws

This pull request was AI-assisted by Isaac.
denik pushed a commit that referenced this pull request May 28, 2026
## Changes

Adds `vector_search_indexes` as a first-class DABs resource on the
direct engine, alongside the existing `vector_search_endpoints`. Direct
engine only — vector search has no Terraform provider.

```yaml
resources:
  vector_search_endpoints:
    my_endpoint:
      name: my-endpoint
      endpoint_type: STANDARD
  vector_search_indexes:
    my_index:
      name: main.default.my_index
      endpoint_name: ${resources.vector_search_endpoints.my_endpoint.name}
      primary_key: id
      index_type: DELTA_SYNC
      delta_sync_index_spec:
        source_table: main.default.source
        pipeline_type: TRIGGERED
      grants:
        - principal: data-engineers
          privileges: [SELECT]
```

What's included:

- **Resource model** in `bundle/config/resources/vector_search_index.go`
(with `grants`) and `bundle/direct/dresources/vector_search_index.go`
(state, lifecycle, drift classification). `RemapState` round-trips
`index_subtype` so a populated remote subtype isn't classified as drift
on the next plan.
- **UC grants** wired through the generic grants path with securable
type `table`.
- **`recreate_on_changes`** for immutable spec fields (`name`,
`endpoint_name`, `index_type`, `index_subtype`, `primary_key`,
`delta_sync_index_spec`, `direct_access_index_spec`);
`delta_sync_index_spec.columns_to_sync` marked `ignore_remote_changes`
(request-only field — see follow-up note below). The index API has no
rename or update path, so any config-side change has to round-trip
through delete + create.
- **Index orphaning detection**: index state persists the
`endpoint_uuid` of the endpoint it was created against. `DoRead` looks
up the current endpoint UUID by name; if the endpoint was deleted
out-of-band the lookup returns `""` and `OverrideChangeDesc` classifies
the saved-vs-remote mismatch as `Recreate`. Builds on the endpoint UUID
persistence merged in #5127.
- **Async delete handling**: new optional `WaitAfterDelete` adapter
method (sibling to `WaitAfterCreate` / `WaitAfterUpdate`). For VS
indexes it polls `GetIndex` until 404 (15-minute cap). `apply.Recreate`
runs `DoDelete → DeleteState → WaitAfterDelete → DoCreate → SaveState →
WaitAfterCreate`, so a wait-time failure leaves the bundle consistent.
Replaces the prior `SaveState("", nil, nil)` placeholder that produced
`invalid state: empty id` planning failures on partial recreate.
- **Destructive-action prompt** for VS indexes in `bundle/phases/`. The
message intentionally covers both Delta Sync ("re-runs the embedding
pipeline") and Direct Access ("upserted vectors lost") in one paragraph
— picking a type-specific message from the bundle config would be wrong
on type changes (`DELTA_SYNC` → `DIRECT_ACCESS` recreates would describe
the destination type while the actual teardown is of the source type).
- **Dev-mode name prefixing** for indexes prefixes only the leaf
component of `catalog.schema.name`, since catalog and schema are
external references (the previous behavior produced invalid names like
`dev_jan_main.default.my_index`). The mutator skips names that still
carry literal `${...}` tokens, since the leaf split would otherwise
inject the prefix inside the trailing ref expression itself.
- **Testserver** enforces endpoint existence on index create. Index
status returns `Ready: true` immediately, matching the convention used
by every other slow resource the testserver fakes (endpoints → `ONLINE`,
database instances → `AVAILABLE`, apps → `RUNNING`).

`index_type` / spec-block consistency is intentionally **not** validated
client-side — the CreateIndex API rejects mismatched combinations at
deploy time, and replicating that check in DABs would just duplicate
backend logic.

## Why

The direct engine recently gained `vector_search_endpoints` (#4887).
This PR extends the support to indexes, which were the missing half.
Along the way it surfaces and fixes a number of issues:

- Without persisted endpoint UUIDs, identity drift was undetectable. An
index pointing at a deleted-and-recreated endpoint would appear live by
name but its backing endpoint was gone, leading to confusing "index
already exists" errors on subsequent deploys. #5127 added the same UUID
tracking on the endpoint side; this PR mirrors it on the index side so
the orphan is caught.
- The async deletion model isn't documented in the SDK, but `recreate`
deploys hit it every time. Without a wait, every recreate failed on the
immediate Create.
- `apply.Recreate` was writing a malformed empty-ID state entry as its
"delete state" step, which then poisoned the next plan with `invalid
state: empty id`.
- Recreating a VS index is genuinely expensive — Delta Sync re-runs the
full embedding pipeline; Direct Access loses every upserted vector. The
destructive-action prompt now reflects that.

## Follow-ups

- **`delta_sync_index_spec.columns_to_sync`** is request-only in the SDK
today: the field is accepted on `Create` but the `Get` response doesn't
echo it back, which is why we mark it `ignore_remote_changes` here.
There's an open backend PR to expose `columns_to_sync` on the read path;
once the SDK is regenerated against that, we can drop the
`ignore_remote_changes` entry and let normal drift detection handle the
field.
- **`vector_search_endpoints.budget_policy_id`** drift (effective vs.
requested) and the SDK doc-comment for
**`vector_search_endpoints.usage_policy_id`** are intentionally not in
this PR — both will be addressed by the next SDK bump and the
corresponding `./task generate-schema` regen.

## Tests

- `./task fmt`, `./task checks`, `./task lint` — all clean.
- `./task test` — unit tests green across `bundle/...`.
- New unit test `TestVectorSearchIndexNameWithUnresolvedRefsLeftAlone`
in `apply_target_mode_test.go` exercises the leaf-prefix skip on
`${var.catalog}.${var.schema}.${var.index}`.
- New acceptance directories under
`acceptance/bundle/resources/vector_search_indexes/`: `basic`,
`drift/columns_to_sync`, `drift/deleted_remotely`,
`drift/orphaned_endpoint`, `recreate/index_type`,
`recreate/mixed_types`, `grants/select`.
- The recreate request log
(`recreate/index_type/out.requests.recreate.direct.json`) captures `GET
→ DELETE → GET → POST` with `--get` enabled in `print_requests.py`. The
middle `GET` is the `WaitAfterDelete` poll; if a future change drops the
wait the regenerated capture loses that line and the test fails.
- `acceptance/bundle/validate/presets_name_prefix` covers the leaf-only
name prefix on a 3-part index name.
- `acceptance/bundle/invariant/configs/vector_search_index.yml.tmpl`
exercises the resource through the invariant matrix; the testserver
enforces endpoint existence on index create.
- Live tested with `--profile tmp` against staging across initial deploy
/ drift / recreate / destroy.

_This PR was written by Claude Code._
bernardo-rodriguez pushed a commit to bernardo-rodriguez/b-cli that referenced this pull request Jun 2, 2026
## Changes

Adds `vector_search_indexes` as a first-class DABs resource on the
direct engine, alongside the existing `vector_search_endpoints`. Direct
engine only — vector search has no Terraform provider.

```yaml
resources:
  vector_search_endpoints:
    my_endpoint:
      name: my-endpoint
      endpoint_type: STANDARD
  vector_search_indexes:
    my_index:
      name: main.default.my_index
      endpoint_name: ${resources.vector_search_endpoints.my_endpoint.name}
      primary_key: id
      index_type: DELTA_SYNC
      delta_sync_index_spec:
        source_table: main.default.source
        pipeline_type: TRIGGERED
      grants:
        - principal: data-engineers
          privileges: [SELECT]
```

What's included:

- **Resource model** in `bundle/config/resources/vector_search_index.go`
(with `grants`) and `bundle/direct/dresources/vector_search_index.go`
(state, lifecycle, drift classification). `RemapState` round-trips
`index_subtype` so a populated remote subtype isn't classified as drift
on the next plan.
- **UC grants** wired through the generic grants path with securable
type `table`.
- **`recreate_on_changes`** for immutable spec fields (`name`,
`endpoint_name`, `index_type`, `index_subtype`, `primary_key`,
`delta_sync_index_spec`, `direct_access_index_spec`);
`delta_sync_index_spec.columns_to_sync` marked `ignore_remote_changes`
(request-only field — see follow-up note below). The index API has no
rename or update path, so any config-side change has to round-trip
through delete + create.
- **Index orphaning detection**: index state persists the
`endpoint_uuid` of the endpoint it was created against. `DoRead` looks
up the current endpoint UUID by name; if the endpoint was deleted
out-of-band the lookup returns `""` and `OverrideChangeDesc` classifies
the saved-vs-remote mismatch as `Recreate`. Builds on the endpoint UUID
persistence merged in databricks#5127.
- **Async delete handling**: new optional `WaitAfterDelete` adapter
method (sibling to `WaitAfterCreate` / `WaitAfterUpdate`). For VS
indexes it polls `GetIndex` until 404 (15-minute cap). `apply.Recreate`
runs `DoDelete → DeleteState → WaitAfterDelete → DoCreate → SaveState →
WaitAfterCreate`, so a wait-time failure leaves the bundle consistent.
Replaces the prior `SaveState("", nil, nil)` placeholder that produced
`invalid state: empty id` planning failures on partial recreate.
- **Destructive-action prompt** for VS indexes in `bundle/phases/`. The
message intentionally covers both Delta Sync ("re-runs the embedding
pipeline") and Direct Access ("upserted vectors lost") in one paragraph
— picking a type-specific message from the bundle config would be wrong
on type changes (`DELTA_SYNC` → `DIRECT_ACCESS` recreates would describe
the destination type while the actual teardown is of the source type).
- **Dev-mode name prefixing** for indexes prefixes only the leaf
component of `catalog.schema.name`, since catalog and schema are
external references (the previous behavior produced invalid names like
`dev_jan_main.default.my_index`). The mutator skips names that still
carry literal `${...}` tokens, since the leaf split would otherwise
inject the prefix inside the trailing ref expression itself.
- **Testserver** enforces endpoint existence on index create. Index
status returns `Ready: true` immediately, matching the convention used
by every other slow resource the testserver fakes (endpoints → `ONLINE`,
database instances → `AVAILABLE`, apps → `RUNNING`).

`index_type` / spec-block consistency is intentionally **not** validated
client-side — the CreateIndex API rejects mismatched combinations at
deploy time, and replicating that check in DABs would just duplicate
backend logic.

## Why

The direct engine recently gained `vector_search_endpoints` (databricks#4887).
This PR extends the support to indexes, which were the missing half.
Along the way it surfaces and fixes a number of issues:

- Without persisted endpoint UUIDs, identity drift was undetectable. An
index pointing at a deleted-and-recreated endpoint would appear live by
name but its backing endpoint was gone, leading to confusing "index
already exists" errors on subsequent deploys. databricks#5127 added the same UUID
tracking on the endpoint side; this PR mirrors it on the index side so
the orphan is caught.
- The async deletion model isn't documented in the SDK, but `recreate`
deploys hit it every time. Without a wait, every recreate failed on the
immediate Create.
- `apply.Recreate` was writing a malformed empty-ID state entry as its
"delete state" step, which then poisoned the next plan with `invalid
state: empty id`.
- Recreating a VS index is genuinely expensive — Delta Sync re-runs the
full embedding pipeline; Direct Access loses every upserted vector. The
destructive-action prompt now reflects that.

## Follow-ups

- **`delta_sync_index_spec.columns_to_sync`** is request-only in the SDK
today: the field is accepted on `Create` but the `Get` response doesn't
echo it back, which is why we mark it `ignore_remote_changes` here.
There's an open backend PR to expose `columns_to_sync` on the read path;
once the SDK is regenerated against that, we can drop the
`ignore_remote_changes` entry and let normal drift detection handle the
field.
- **`vector_search_endpoints.budget_policy_id`** drift (effective vs.
requested) and the SDK doc-comment for
**`vector_search_endpoints.usage_policy_id`** are intentionally not in
this PR — both will be addressed by the next SDK bump and the
corresponding `./task generate-schema` regen.

## Tests

- `./task fmt`, `./task checks`, `./task lint` — all clean.
- `./task test` — unit tests green across `bundle/...`.
- New unit test `TestVectorSearchIndexNameWithUnresolvedRefsLeftAlone`
in `apply_target_mode_test.go` exercises the leaf-prefix skip on
`${var.catalog}.${var.schema}.${var.index}`.
- New acceptance directories under
`acceptance/bundle/resources/vector_search_indexes/`: `basic`,
`drift/columns_to_sync`, `drift/deleted_remotely`,
`drift/orphaned_endpoint`, `recreate/index_type`,
`recreate/mixed_types`, `grants/select`.
- The recreate request log
(`recreate/index_type/out.requests.recreate.direct.json`) captures `GET
→ DELETE → GET → POST` with `--get` enabled in `print_requests.py`. The
middle `GET` is the `WaitAfterDelete` poll; if a future change drops the
wait the regenerated capture loses that line and the test fails.
- `acceptance/bundle/validate/presets_name_prefix` covers the leaf-only
name prefix on a 3-part index name.
- `acceptance/bundle/invariant/configs/vector_search_index.yml.tmpl`
exercises the resource through the invariant matrix; the testserver
enforces endpoint existence on index create.
- Live tested with `--profile tmp` against staging across initial deploy
/ drift / recreate / destroy.

_This PR was written by Claude Code._
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants