Skip to content

Add top level run_as support for DLT pipelines#3307

Merged
shreyas-goenka merged 6 commits intomainfrom
run-as-pipeline
Sep 2, 2025
Merged

Add top level run_as support for DLT pipelines#3307
shreyas-goenka merged 6 commits intomainfrom
run-as-pipeline

Conversation

@shreyas-goenka
Copy link
Copy Markdown
Contributor

@shreyas-goenka shreyas-goenka commented Jul 28, 2025

Changes

This PR adds support for top-level run_as to DABs. Before DABs would error if pipelines were used with the run_as field set to a value.

Now, we'll transparently read the run_as value and set it for all pipelines in the bundle.

Tests

New acceptance tests. Also manually tested that the run_as feature works as expected.

@eng-dev-ecosystem-bot
Copy link
Copy Markdown
Collaborator

eng-dev-ecosystem-bot commented Jul 28, 2025

Run: 17406750859

Env ✅​pass 🔄​flaky 🙈​skip
✅​ aws linux 308 512
✅​ aws windows 309 511
✅​ aws-ucws linux 420 410
✅​ aws-ucws windows 421 409
✅​ azure linux 308 511
✅​ azure windows 309 510
✅​ azure-ucws linux 420 409
✅​ azure-ucws windows 421 408
🔄​ gcp linux 304 3 513
✅​ gcp windows 308 512
Test Name gcp linux
TestAccept 🔄​flaky
TestAccept/bundle/templates/default-python/integration_classic 🔄​flaky
TestAccept/bundle/templates/default-python/integration_classic/DATABRICKS_CLI_DEPLOYMENT=direct-exp/UV_PYTHON=3.10 🔄​flaky

github-merge-queue bot pushed a commit that referenced this pull request Jul 30, 2025
## Why
Since we'll soon be adding support for run_as for DLT pipelines we can
remove support for the legacy mode.
#3307

Tracking usage will inform us of the impact from removing this option. 

## Tests
New test and existing ones.
@shreyas-goenka shreyas-goenka marked this pull request as ready for review September 1, 2025 09:35
@shreyas-goenka
Copy link
Copy Markdown
Contributor Author

shreyas-goenka commented Sep 2, 2025

Recording discussion from slack.

Pieter Noordhuis:
Since this is net-new, it seems good.
I have questions:

  1. What happens if you change from no run_as to setting run_as , if a pipeline has already created tables? I recall a thread with the team saying they would perform a blocking "chown", which can be error prone.
  2. What happens in the reverse, if you try to go from having a run_as to not having one?
  3. Is there a path to get rid of the experimental flag and get existing users to use this?

Shreyas Goenka:

I recall a thread with the team saying they would perform a blocking "chown"

That's exactly what happens. The table's ownership is changed to the run_as identity. I only tried this once but I did not observe any errors during that run.

What happens in the reverse, if you try to go from having a run_as to not having one?

That is not allowed by the backend.

Is there a path to get rid of the experimental flag and get existing users to use this?

There should be options:

  1. Transparently onboard legacy users. Legacy user pipelines have owners set to the run_as identity. Transparently onboarding them would mean changing the owner of the deployment identity and then setting run_as for the pipeline. Denys Kuznietsov agreed that this should be safe but I'd need to validate whether there are failure modes in this.
  2. Make it explicit, have a migration guide for them. As long as we can decouple the tables' lifecycle from the DLT pipeline (DPM mode), we should be able to do this safely. This needs investigation.
    For me, it's also a question of prioritization. Right now, removing the legacy option does not seem high enough priority compared to the other things we have going on.

@shreyas-goenka shreyas-goenka added this pull request to the merge queue Sep 2, 2025
Merged via the queue into main with commit 3c03fdd Sep 2, 2025
13 checks passed
@shreyas-goenka shreyas-goenka deleted the run-as-pipeline branch September 2, 2025 15:07
shreyas-goenka added a commit that referenced this pull request Sep 3, 2025
Add top level run_as support for DLT pipelines

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
github-merge-queue bot pushed a commit that referenced this pull request Sep 3, 2025
## Summary

This PR adds a changelog entry for PR #3307 which implemented top level
`run_as` support for DLT pipelines in Databricks Asset Bundles (DABs).

## Changes

- Added entry to `NEXT_CHANGELOG.md` under the Bundles section
documenting the new top level `run_as` support for DLT pipelines

## Context

PR #3307 enabled DABs to support the `run_as` field at the top level for
DLT pipelines. Previously, DABs would error if pipelines were used with
the `run_as` field set. Now the CLI transparently reads the `run_as`
value and applies it to all pipelines in the bundle.

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
deco-sdk-tagging bot added a commit that referenced this pull request Sep 3, 2025
## Release v0.267.0

### CLI
* Introduce retries to `databricks psql` command ([#3492](#3492))
* Add rule files for coding agents working on the CLI code base ([#3245](#3245))

### Dependency updates
* Upgrade TF provider to 1.88.0 ([#3529](#3529))
* Upgrade Go SDK to 0.82.0

### Bundles
* Update default-python template to make DB Connect work out of the box for unit tests, using uv to install dependencies ([#3254](#3254))
* Add support for `TaskRetryMode` for continuous jobs ([#3529](#3529))
* Add support for specifying database instance as an application resource ([#3529](#3529))
* Allow referencing job libraries outside bundle root without the need to specify sync root ([#2842](#2842))
* Add top level `run_as` support for Lakeflow Declarative Pipelines ([#3307](#3307))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants