Added support for lifecycle.started option#4672
Conversation
|
Commit: ef5f09a
17 interesting tests: 10 SKIP, 7 KNOWN
Top 20 slowest tests (at least 2 minutes):
|
simonfaltum
left a comment
There was a problem hiding this comment.
Review (automated, 2 agents)
Verdict: Not ready yet | 3 Critical | 3 Major | 2 Gap(Major) | 3 Nit | 1 Suggestion
[Critical] DoCreate never deploys app code when lifecycle.started=true
bundle/direct/dresources/app.go (DoCreate)
DoCreate only flips NoCompute and creates the app shell, but never calls Apps.Deploy. On first bundle deploy with started=true, the app gets compute but no actual deployment.
Suggestion: After create + wait, build deployment and call appdeploy.Deploy when started=true.
[Critical] All local-only fields Skipped, preventing DoUpdate from running
bundle/direct/dresources/app.go (OverrideChangeDesc + DoUpdate)
OverrideChangeDesc marks started, source_code_path, config, and git_source as Skip. If no other app fields change, the planner never calls DoUpdate, so lifecycle.started=true has no effect. The acceptance test masks this by always changing description alongside started.
Suggestion: Model app deployment as its own actionable step, or ensure started changes produce a non-skip action.
[Critical] Clusters and SQL warehouses: started=true on stopped resources is a no-op
bundle/direct/dresources/cluster.go, bundle/direct/dresources/sql_warehouse.go
started is also Skipped for clusters/warehouses. Even if another field triggers DoUpdate, Clusters.Edit on a terminated cluster doesn't start it. The bundle never converges to the requested active state.
Suggestion: Plan an explicit Start step when started=true and resource is stopped.
[Major] LifecycleWithStarted duplicates PreventDestroy instead of embedding Lifecycle
bundle/config/resources/lifecycle.go:18-32
If Lifecycle gains new fields, LifecycleWithStarted won't inherit them. Suggestion: Embed Lifecycle in LifecycleWithStarted.
[Major] plan_test.go lost coverage breadth
bundle/phases/plan_test.go
Old test iterated ALL resource types for checkForPreventDestroy. New tests only cover 2 specific types. Suggestion: Keep a parametric test over all resource types.
[Major] No validation for lifecycle.started on unsupported resource types
bundle/config/mutator/validate_lifecycle_started.go:30-46
Setting lifecycle.started on a job in direct mode only produces a schema warning, not an error. Suggestion: Error explicitly for unsupported types.
[Gap (Major)] Acceptance test never tests started-only toggle
The test always changes description alongside started. No test for: first deploy issuing /deployments, source-only redeploys, or toggling started without other changes.
[Gap (Major)] No acceptance coverage for cluster or SQL warehouse lifecycle.started
Only the app path is tested.
[Nit] Validation error doesn't identify which resource
validate_lifecycle_started.go:40-46 - Include resource key in error message.
[Nit] Duplicate lifecycle entries in schema output
out.fields.txt - Both Lifecycle and LifecycleWithStarted show for apps/clusters/warehouses.
[Nit] Redundant zero-value assignments in RemapState
app.go:93-100 - Explicit zero values are unnecessary in Go struct init.
There was a problem hiding this comment.
Note: This review was posted by Claude (AI assistant).
Priority: HIGH — Several critical correctness issues
MAJOR: Clusters and SQL Warehouses started=true has no effect on subsequent deploys
For clusters, OverrideChangeDesc marks started as Skip, but DoUpdate (which calls Clusters.Edit) does NOT start a terminated cluster. There is no code path that calls Clusters.Start when started=true and the cluster is terminated. Same issue for SQL warehouses. This means lifecycle.started: true only has effect during initial creation — on subsequent deploys, a stopped resource stays stopped.
MAJOR: If only started changes on an app, DoUpdate is never called
OverrideChangeDesc marks started, source_code_path, config, and git_source as Skip. If toggling only started from false→true with no other field changes, all fields get skipped and DoUpdate never fires. The acceptance test masks this by always changing description alongside started.
MAJOR: LifecycleWithStarted duplicates PreventDestroy instead of embedding Lifecycle
type LifecycleWithStarted struct {
PreventDestroy bool `json:"prevent_destroy,omitempty"`
Started *bool `json:"started,omitempty"`
}Should embed Lifecycle instead:
type LifecycleWithStarted struct {
Lifecycle
Started *bool `json:"started,omitempty"`
}Without this, any future fields added to Lifecycle will be silently missing from LifecycleWithStarted.
MAJOR: Field shadowing creates duplicate lifecycle schema entries
Apps, clusters, and SQL warehouses now have TWO lifecycle fields (one from BaseResource, one from the override). The schema output shows duplicate entries which is confusing. Visible in out.fields.txt:
resources.apps.*.lifecycle resources.Lifecycle INPUT
resources.apps.*.lifecycle resources.LifecycleWithStarted INPUT
MEDIUM: ILifecycle naming not idiomatic Go
The I prefix for interfaces is a Java/C# convention. Consider LifecycleConfig or similar.
MEDIUM: Lost parametric test coverage
The old TestCheckPreventDestroyForAllResources iterated over ALL resource types. The new tests only cover Job and App — significant regression in test breadth.
MEDIUM: No unit tests for ValidateLifecycleStarted
The new mutator has no corresponding test file. The error diagnostic also doesn't identify WHICH resource has the issue.
What looks good
appdeploypackage extraction is clean DRY improvement- Test server additions are thorough with proper state management
- Schema and annotation descriptions are clear
- The overall feature design is well thought out
Focus areas for review
- Cluster/warehouse update path —
started=trueineffective after creation - App started-only toggle — silent no-op
- Field embedding —
LifecycleWithStartedshould embedLifecycle - Test coverage restoration
All of these are expected |
denik
left a comment
There was a problem hiding this comment.
Summary of offline discussion:
- we should have a test where we only change config entry from started=false to started=true and vice versa. This should only trigger Start/Stop call but not update call (we should record requests to confirm)
- started=false should not be the same as started omitted. It should mean stopped and omitted should mean "dont care about start/stop status" which is backward compatible with current behaviour.
acceptance/bundle/resources/apps/lifecycle-started/out.deploy.direct.txt
Outdated
Show resolved
Hide resolved
| @@ -0,0 +1,10 @@ | |||
|
|
|||
| >>> update_file.py databricks.yml my_app_description MY_APP_DESCRIPTION | |||
There was a problem hiding this comment.
The test is a bit difficult to read because the update operations are separated from the actual applies / assertions in out.deploy.direct.txt. Can we inline these update operations there as well? No need for an output.txt here.
| } | ||
|
|
||
| // Anonymous embedded structs are transparent in JSON; skip them as standalone fields. | ||
| if field.Anonymous { |
There was a problem hiding this comment.
question - Why did we need this, it does not seem like we added any new required fields? The generated code did not change?
There was a problem hiding this comment.
We need this so lifecycle is not added to the required fields, I presume it has something to do with Lifecycle being also defined in BaseResource which is inlined in App struct
There was a problem hiding this comment.
Would it work if we do this:
Lifecycle *LifecycleWithStarted
json:"lifecycle,omitempty"
instead of this:
Lifecycle LifecycleWithStarted
json:"lifecycle,omitempty"
?
Then it should not be included in required. Pointer is more logical there given that we want the whole struct to be optional?
I'm worried skipping anonymous structs wholesale can have unintended consequences (nothing inside embedded struct is processed even if it should).
There was a problem hiding this comment.
*LifecycleWithStarted makes sense indepndently but the problem is slightly different: The LifecycleWithStarted struct embeds Lifecycle anonymously (no json tag).
Without the check if there's no json tag, "lifecycle" gets added as required under "resources.apps.*". I changed the change a bit in required.go to represent that we actually want to skip empty json tags
| } | ||
|
|
||
| // Anonymous embedded structs are transparent in JSON; skip them as standalone fields. | ||
| if field.Anonymous { |
There was a problem hiding this comment.
Would it work if we do this:
Lifecycle *LifecycleWithStarted
json:"lifecycle,omitempty"
instead of this:
Lifecycle LifecycleWithStarted
json:"lifecycle,omitempty"
?
Then it should not be included in required. Pointer is more logical there given that we want the whole struct to be optional?
I'm worried skipping anonymous structs wholesale can have unintended consequences (nothing inside embedded struct is processed even if it should).
| request := apps.AsyncUpdateAppRequest{ | ||
| App: &config.App, | ||
| AppName: id, | ||
| UpdateMask: updateMask, |
There was a problem hiding this comment.
What kind of format does updateMask support? Is it only top level fields or inner fields as well?
Changes's key have indices, both integer (tasks[0]) and key-value (tasks[task_name="hello"]), I doubt it makes sense for backend? the latter is very DABs-specific.
There was a problem hiding this comment.
Yes, it does support nested fields but does not support elements in sequence or map fields is not allowed, as only the entire collection field can be specified. I'll fix that.
There was a problem hiding this comment.
why calculate mask at all? our model is that we update full resource always, cannot we just use a statically configured mask? (or omit mask, not sure if that works).
There was a problem hiding this comment.
Cannot we just do this?
UpdateMask: []string{"budget_policy_id", "compute_size", "description", "resources", "status", "usage_policy_id", "user_api_scopes"}
There was a problem hiding this comment.
or even UpdateMask: "*"?
The docs say it's not recommended if new field is added to backend BUT
- it's an issue with all APIs that we use that don't use updateMask (most of them)
- to properly support a new field, we need to rebuild DAB with that field in SDK. Wiping it with default otherwise does not seem unreasonable.
bundle/direct/dresources/app.go
Outdated
| apps.App | ||
| Config *resources.AppConfig `json:"config,omitempty"` | ||
| GitSource *apps.GitSource `json:"git_source,omitempty"` | ||
| Lifecycle *AppStateLifecycle `json:"lifecycle,omitempty"` |
There was a problem hiding this comment.
question - would it make sense to embed AppRemote into AppState? they overlap except for one field.
There was a problem hiding this comment.
We coiu;d but it requires changes to findStructFieldByKey since it only walks 1 level of anonymous embeds. Double embedding (AppState -> AppRemote -> apps.App) breaks field access for apps.App fields like name.
| "config": true, | ||
| "git_source": true, | ||
| "lifecycle": true, | ||
| "lifecycle.started": true, |
There was a problem hiding this comment.
but these fields to have remote counter-part now? (all but source_code_path, see separate q about that).
There was a problem hiding this comment.
They do, but are managed via the Deploy API, not the App Update API, so they must not appear in update_mask, I'll update the comment
|
|
||
| title "Deploy bundle" | ||
| trace $CLI bundle deploy | ||
| trace $CLI bundle run my_app > /dev/null || true |
There was a problem hiding this comment.
is it expected to fail? the we should use my_app.
If it fails sometimes then I wonder what do we get from having this command in acceptance test?
| @@ -0,0 +1,3 @@ | |||
| # Run the app after the deploy otherwise migrate will show the drift on the source code path. | |||
There was a problem hiding this comment.
This means that users of app will also experience drift until they do "run"?
Ideally we want to assert that right after deploy there is no drift.
Since it's a known behavior for this field, is it possible to account for it in resources.yml or in OverrideChangeDesc?
| trace $CLI bundle plan | ||
| trace $CLI bundle deploy | ||
| trace $CLI bundle run mykey | ||
| trace print_requests >> out.requests.$DATABRICKS_BUNDLE_ENGINE.json |
There was a problem hiding this comment.
are requests for run expected to differ between runs? From quick glance it does not seem so.
In that case, can we record a diff instead?
trace $CLI bundle run mykey > out.requests.run1.json
trace $CLI bundle run mykey > tmp.requests.run2.json
trace diff.py out.requests.run1.json tmp.requests.run2.json
We can also add a helper:
trace $CLI bundle run mykey > out.requests.run1.json
trace $CLI bundle run mykey | diff_again.py out.requests.run1.json
|
|
||
| trace $CLI bundle plan | ||
| trace $CLI bundle deploy | ||
| trace $CLI bundle run mykey |
There was a problem hiding this comment.
Are requests for run expected to differ between engines? It looks like they are the same. In that case, can we move them to non-engine specific dedicated files? e.g. out.requests.run1.json
| deployment.Command = config.Command | ||
| } | ||
|
|
||
| if len(config.Env) > 0 { |
There was a problem hiding this comment.
nit: if unnecessary? for-loop will take care of it.
| // lifecycle.started is a direct-mode-only feature. | ||
| if !m.engine.IsDirect() { | ||
| path := "resources." + group.Description.PluralName + "." + key + ".lifecycle.started" | ||
| diags = diags.Append(diag.Diagnostic{ |
There was a problem hiding this comment.
nit: logdiag.LogError does not require collection.
| } | ||
|
|
||
| // lifecycle.started is a direct-mode-only feature. | ||
| if !m.engine.IsDirect() { |
There was a problem hiding this comment.
Can this check be moved to the top level? We don't iteration for direct engine.
| if app.ActiveDeployment != nil { | ||
| // The source code path in active deployment is snapshotted version of the source code path in the app. | ||
| // We need to use the default source code path to get the correct source code path for drift detection. | ||
| remote.SourceCodePath = app.DefaultSourceCodePath |
There was a problem hiding this comment.
Question, why not always set SourceCodePath to app.DefaultSourceCodePath? (even if app.ActiveDeployment="")
| request := apps.AsyncUpdateAppRequest{ | ||
| App: &config.App, | ||
| AppName: id, | ||
| UpdateMask: updateMask, |
There was a problem hiding this comment.
Cannot we just do this?
UpdateMask: []string{"budget_policy_id", "compute_size", "description", "resources", "status", "usage_policy_id", "user_api_scopes"}
| request := apps.AsyncUpdateAppRequest{ | ||
| App: &config.App, | ||
| AppName: id, | ||
| UpdateMask: updateMask, |
There was a problem hiding this comment.
or even UpdateMask: "*"?
The docs say it's not recommended if new field is added to backend BUT
- it's an issue with all APIs that we use that don't use updateMask (most of them)
- to properly support a new field, we need to rebuild DAB with that field in SDK. Wiping it with default otherwise does not seem unreasonable.
Changes
Added support for lifecycle.started option
Why
This new option allows to start resources such as apps, clusters and sql warehouses in started/active state.
For apps: when this option enabled, on each bundle deploy we automatically will trigger a new app deploy
Tests
Added an acceptance test