Skip to content

Added support for lifecycle.started option#4672

Open
andrewnester wants to merge 44 commits intomainfrom
feat/lifecycle-started
Open

Added support for lifecycle.started option#4672
andrewnester wants to merge 44 commits intomainfrom
feat/lifecycle-started

Conversation

@andrewnester
Copy link
Copy Markdown
Contributor

Changes

Added support for lifecycle.started option

Why

This new option allows to start resources such as apps, clusters and sql warehouses in started/active state.
For apps: when this option enabled, on each bundle deploy we automatically will trigger a new app deploy

Tests

Added an acceptance test

@eng-dev-ecosystem-bot
Copy link
Copy Markdown
Collaborator

eng-dev-ecosystem-bot commented Mar 6, 2026

Commit: ef5f09a

Run: 23750330860

Env 🟨​KNOWN 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
🟨​ aws linux 7 10 270 817 6:36
🟨​ aws windows 7 10 272 815 6:34
💚​ aws-ucws linux 7 10 366 733 7:50
💚​ aws-ucws windows 7 10 368 731 5:46
💚​ azure linux 1 12 273 815 6:23
💚​ azure windows 1 12 275 813 4:46
💚​ azure-ucws linux 1 12 371 729 7:45
💚​ azure-ucws windows 1 12 373 727 5:03
💚​ gcp linux 1 12 269 818 6:01
💚​ gcp windows 1 12 271 816 6:35
17 interesting tests: 10 SKIP, 7 KNOWN
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
Top 20 slowest tests (at least 2 minutes):
duration env testname
4:19 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:45 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:39 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:14 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:13 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:12 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:11 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:10 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:08 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:07 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:51 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:49 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:48 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:47 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:46 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:45 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:39 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:19 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:18 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:11 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct

Copy link
Copy Markdown
Member

@simonfaltum simonfaltum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review (automated, 2 agents)

Verdict: Not ready yet | 3 Critical | 3 Major | 2 Gap(Major) | 3 Nit | 1 Suggestion

[Critical] DoCreate never deploys app code when lifecycle.started=true

bundle/direct/dresources/app.go (DoCreate)

DoCreate only flips NoCompute and creates the app shell, but never calls Apps.Deploy. On first bundle deploy with started=true, the app gets compute but no actual deployment.

Suggestion: After create + wait, build deployment and call appdeploy.Deploy when started=true.

[Critical] All local-only fields Skipped, preventing DoUpdate from running

bundle/direct/dresources/app.go (OverrideChangeDesc + DoUpdate)

OverrideChangeDesc marks started, source_code_path, config, and git_source as Skip. If no other app fields change, the planner never calls DoUpdate, so lifecycle.started=true has no effect. The acceptance test masks this by always changing description alongside started.

Suggestion: Model app deployment as its own actionable step, or ensure started changes produce a non-skip action.

[Critical] Clusters and SQL warehouses: started=true on stopped resources is a no-op

bundle/direct/dresources/cluster.go, bundle/direct/dresources/sql_warehouse.go

started is also Skipped for clusters/warehouses. Even if another field triggers DoUpdate, Clusters.Edit on a terminated cluster doesn't start it. The bundle never converges to the requested active state.

Suggestion: Plan an explicit Start step when started=true and resource is stopped.

[Major] LifecycleWithStarted duplicates PreventDestroy instead of embedding Lifecycle

bundle/config/resources/lifecycle.go:18-32

If Lifecycle gains new fields, LifecycleWithStarted won't inherit them. Suggestion: Embed Lifecycle in LifecycleWithStarted.

[Major] plan_test.go lost coverage breadth

bundle/phases/plan_test.go

Old test iterated ALL resource types for checkForPreventDestroy. New tests only cover 2 specific types. Suggestion: Keep a parametric test over all resource types.

[Major] No validation for lifecycle.started on unsupported resource types

bundle/config/mutator/validate_lifecycle_started.go:30-46

Setting lifecycle.started on a job in direct mode only produces a schema warning, not an error. Suggestion: Error explicitly for unsupported types.

[Gap (Major)] Acceptance test never tests started-only toggle

The test always changes description alongside started. No test for: first deploy issuing /deployments, source-only redeploys, or toggling started without other changes.

[Gap (Major)] No acceptance coverage for cluster or SQL warehouse lifecycle.started

Only the app path is tested.

[Nit] Validation error doesn't identify which resource

validate_lifecycle_started.go:40-46 - Include resource key in error message.

[Nit] Duplicate lifecycle entries in schema output

out.fields.txt - Both Lifecycle and LifecycleWithStarted show for apps/clusters/warehouses.

[Nit] Redundant zero-value assignments in RemapState

app.go:93-100 - Explicit zero values are unnecessary in Go struct init.

Copy link
Copy Markdown
Contributor

@shreyas-goenka shreyas-goenka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This review was posted by Claude (AI assistant).

Priority: HIGH — Several critical correctness issues

MAJOR: Clusters and SQL Warehouses started=true has no effect on subsequent deploys

For clusters, OverrideChangeDesc marks started as Skip, but DoUpdate (which calls Clusters.Edit) does NOT start a terminated cluster. There is no code path that calls Clusters.Start when started=true and the cluster is terminated. Same issue for SQL warehouses. This means lifecycle.started: true only has effect during initial creation — on subsequent deploys, a stopped resource stays stopped.

MAJOR: If only started changes on an app, DoUpdate is never called

OverrideChangeDesc marks started, source_code_path, config, and git_source as Skip. If toggling only started from false→true with no other field changes, all fields get skipped and DoUpdate never fires. The acceptance test masks this by always changing description alongside started.

MAJOR: LifecycleWithStarted duplicates PreventDestroy instead of embedding Lifecycle

type LifecycleWithStarted struct {
    PreventDestroy bool  `json:"prevent_destroy,omitempty"`
    Started        *bool `json:"started,omitempty"`
}

Should embed Lifecycle instead:

type LifecycleWithStarted struct {
    Lifecycle
    Started *bool `json:"started,omitempty"`
}

Without this, any future fields added to Lifecycle will be silently missing from LifecycleWithStarted.

MAJOR: Field shadowing creates duplicate lifecycle schema entries

Apps, clusters, and SQL warehouses now have TWO lifecycle fields (one from BaseResource, one from the override). The schema output shows duplicate entries which is confusing. Visible in out.fields.txt:

resources.apps.*.lifecycle  resources.Lifecycle           INPUT
resources.apps.*.lifecycle  resources.LifecycleWithStarted  INPUT

MEDIUM: ILifecycle naming not idiomatic Go

The I prefix for interfaces is a Java/C# convention. Consider LifecycleConfig or similar.

MEDIUM: Lost parametric test coverage

The old TestCheckPreventDestroyForAllResources iterated over ALL resource types. The new tests only cover Job and App — significant regression in test breadth.

MEDIUM: No unit tests for ValidateLifecycleStarted

The new mutator has no corresponding test file. The error diagnostic also doesn't identify WHICH resource has the issue.

What looks good

  • appdeploy package extraction is clean DRY improvement
  • Test server additions are thorough with proper state management
  • Schema and annotation descriptions are clear
  • The overall feature design is well thought out

Focus areas for review

  1. Cluster/warehouse update path — started=true ineffective after creation
  2. App started-only toggle — silent no-op
  3. Field embedding — LifecycleWithStarted should embed Lifecycle
  4. Test coverage restoration

@andrewnester
Copy link
Copy Markdown
Contributor Author

[Critical] DoCreate never deploys app code when lifecycle.started=true
MAJOR: If only started changes on an app, DoUpdate is never called
MEDIUM: No unit tests for ValidateLifecycleStarted
[Major] No validation for lifecycle.started on unsupported resource types

All of these are expected

Copy link
Copy Markdown
Contributor

@denik denik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of offline discussion:

  • we should have a test where we only change config entry from started=false to started=true and vice versa. This should only trigger Start/Stop call but not update call (we should record requests to confirm)
  • started=false should not be the same as started omitted. It should mean stopped and omitted should mean "dont care about start/stop status" which is backward compatible with current behaviour.

@@ -0,0 +1,10 @@

>>> update_file.py databricks.yml my_app_description MY_APP_DESCRIPTION
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is a bit difficult to read because the update operations are separated from the actual applies / assertions in out.deploy.direct.txt. Can we inline these update operations there as well? No need for an output.txt here.

}

// Anonymous embedded structs are transparent in JSON; skip them as standalone fields.
if field.Anonymous {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question - Why did we need this, it does not seem like we added any new required fields? The generated code did not change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this so lifecycle is not added to the required fields, I presume it has something to do with Lifecycle being also defined in BaseResource which is inlined in App struct

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work if we do this:

Lifecycle *LifecycleWithStarted json:"lifecycle,omitempty"

instead of this:

Lifecycle LifecycleWithStarted json:"lifecycle,omitempty"

?

Then it should not be included in required. Pointer is more logical there given that we want the whole struct to be optional?

I'm worried skipping anonymous structs wholesale can have unintended consequences (nothing inside embedded struct is processed even if it should).

cc @shreyas-goenka

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*LifecycleWithStarted makes sense indepndently but the problem is slightly different: The LifecycleWithStarted struct embeds Lifecycle anonymously (no json tag).
Without the check if there's no json tag, "lifecycle" gets added as required under "resources.apps.*". I changed the change a bit in required.go to represent that we actually want to skip empty json tags

@andrewnester andrewnester requested a review from denik March 27, 2026 14:14
}

// Anonymous embedded structs are transparent in JSON; skip them as standalone fields.
if field.Anonymous {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work if we do this:

Lifecycle *LifecycleWithStarted json:"lifecycle,omitempty"

instead of this:

Lifecycle LifecycleWithStarted json:"lifecycle,omitempty"

?

Then it should not be included in required. Pointer is more logical there given that we want the whole struct to be optional?

I'm worried skipping anonymous structs wholesale can have unintended consequences (nothing inside embedded struct is processed even if it should).

cc @shreyas-goenka

request := apps.AsyncUpdateAppRequest{
App: &config.App,
AppName: id,
UpdateMask: updateMask,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of format does updateMask support? Is it only top level fields or inner fields as well?

Changes's key have indices, both integer (tasks[0]) and key-value (tasks[task_name="hello"]), I doubt it makes sense for backend? the latter is very DABs-specific.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does support nested fields but does not support elements in sequence or map fields is not allowed, as only the entire collection field can be specified. I'll fix that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why calculate mask at all? our model is that we update full resource always, cannot we just use a statically configured mask? (or omit mask, not sure if that works).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot we just do this?

UpdateMask: []string{"budget_policy_id", "compute_size", "description", "resources", "status", "usage_policy_id", "user_api_scopes"}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or even UpdateMask: "*"?

The docs say it's not recommended if new field is added to backend BUT

  1. it's an issue with all APIs that we use that don't use updateMask (most of them)
  2. to properly support a new field, we need to rebuild DAB with that field in SDK. Wiping it with default otherwise does not seem unreasonable.

apps.App
Config *resources.AppConfig `json:"config,omitempty"`
GitSource *apps.GitSource `json:"git_source,omitempty"`
Lifecycle *AppStateLifecycle `json:"lifecycle,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question - would it make sense to embed AppRemote into AppState? they overlap except for one field.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We coiu;d but it requires changes to findStructFieldByKey since it only walks 1 level of anonymous embeds. Double embedding (AppState -> AppRemote -> apps.App) breaks field access for apps.App fields like name.

"config": true,
"git_source": true,
"lifecycle": true,
"lifecycle.started": true,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but these fields to have remote counter-part now? (all but source_code_path, see separate q about that).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do, but are managed via the Deploy API, not the App Update API, so they must not appear in update_mask, I'll update the comment

@andrewnester andrewnester requested a review from denik March 30, 2026 09:59

title "Deploy bundle"
trace $CLI bundle deploy
trace $CLI bundle run my_app > /dev/null || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it expected to fail? the we should use my_app.

If it fails sometimes then I wonder what do we get from having this command in acceptance test?

@@ -0,0 +1,3 @@
# Run the app after the deploy otherwise migrate will show the drift on the source code path.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that users of app will also experience drift until they do "run"?

Ideally we want to assert that right after deploy there is no drift.

Since it's a known behavior for this field, is it possible to account for it in resources.yml or in OverrideChangeDesc?

trace $CLI bundle plan
trace $CLI bundle deploy
trace $CLI bundle run mykey
trace print_requests >> out.requests.$DATABRICKS_BUNDLE_ENGINE.json
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are requests for run expected to differ between runs? From quick glance it does not seem so.

In that case, can we record a diff instead?

trace $CLI bundle run mykey > out.requests.run1.json
trace $CLI bundle run mykey > tmp.requests.run2.json
trace diff.py out.requests.run1.json tmp.requests.run2.json

We can also add a helper:

trace $CLI bundle run mykey > out.requests.run1.json
trace $CLI bundle run mykey | diff_again.py out.requests.run1.json


trace $CLI bundle plan
trace $CLI bundle deploy
trace $CLI bundle run mykey
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are requests for run expected to differ between engines? It looks like they are the same. In that case, can we move them to non-engine specific dedicated files? e.g. out.requests.run1.json

deployment.Command = config.Command
}

if len(config.Env) > 0 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if unnecessary? for-loop will take care of it.

// lifecycle.started is a direct-mode-only feature.
if !m.engine.IsDirect() {
path := "resources." + group.Description.PluralName + "." + key + ".lifecycle.started"
diags = diags.Append(diag.Diagnostic{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: logdiag.LogError does not require collection.

}

// lifecycle.started is a direct-mode-only feature.
if !m.engine.IsDirect() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this check be moved to the top level? We don't iteration for direct engine.

if app.ActiveDeployment != nil {
// The source code path in active deployment is snapshotted version of the source code path in the app.
// We need to use the default source code path to get the correct source code path for drift detection.
remote.SourceCodePath = app.DefaultSourceCodePath
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, why not always set SourceCodePath to app.DefaultSourceCodePath? (even if app.ActiveDeployment="")

request := apps.AsyncUpdateAppRequest{
App: &config.App,
AppName: id,
UpdateMask: updateMask,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot we just do this?

UpdateMask: []string{"budget_policy_id", "compute_size", "description", "resources", "status", "usage_policy_id", "user_api_scopes"}

request := apps.AsyncUpdateAppRequest{
App: &config.App,
AppName: id,
UpdateMask: updateMask,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or even UpdateMask: "*"?

The docs say it's not recommended if new field is added to backend BUT

  1. it's an issue with all APIs that we use that don't use updateMask (most of them)
  2. to properly support a new field, we need to rebuild DAB with that field in SDK. Wiping it with default otherwise does not seem unreasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants