Skip to content

lakebox: explicitly start stopped sandbox before ssh#5429

Merged
akshaysingla-db merged 2 commits into
databricks:demo-lakeboxfrom
akshaysingla-db:akshay/lakebox-ssh-autostart
Jun 4, 2026
Merged

lakebox: explicitly start stopped sandbox before ssh#5429
akshaysingla-db merged 2 commits into
databricks:demo-lakeboxfrom
akshaysingla-db:akshay/lakebox-ssh-autostart

Conversation

@akshaysingla-db
Copy link
Copy Markdown
Collaborator

@akshaysingla-db akshaysingla-db commented Jun 4, 2026

Summary

Replaces the bare warn(ctx, "Starting <id>…") notice in lakebox ssh with an explicit api.start + waitForRunning sequence before exec-ing ssh.

Why

The user-principal that triggers the start should be the user, not the gateway. Today the SSH gateway implicitly starts a stopped sandbox on connect, which means audit trails, billing attribution, and any server-side hooks see the gateway service principal as the actor — not the human who actually wanted the sandbox up. Routing the start through api.start from the CLI carries the workspace credential of the running user, so the StartSandbox RPC is attributed to them.

Nice-to-haves that fall out for free:

  • Visible spinner with elapsed seconds instead of an opaque "ssh hangs for 5 minutes" wait.
  • Deterministic timeout (the same one databricks lakebox start uses) rather than racing the SSH gateway's cold-start window.
  • Already-transitioning states (Creating, Starting) get polled cleanly via the same ensureRunning helper — no double-start RPC.
  • The fresh sandboxEntry we get back lets us refresh the cached GatewayHost if it changed during the start.

User-visible change

Before:

! Starting happy-panda-1234… (may take a few minutes)
⠋ Connecting to happy-panda-1234…   ← hangs silently, can time out at the ssh layer

After:

⠋ Starting happy-panda-1234… (8s)
⠋ Starting happy-panda-1234… (10s)
…
✓ Started happy-panda-1234
⠋ Connecting to happy-panda-1234…   ← fast, sandbox already running

Test plan

  • go test ./cmd/lakebox/... passes
  • go build ./... clean
  • Live test against a stopped sandbox (deferred — 5–10 min wall clock per attempt)
  • Confirm StartSandbox audit-log entry is attributed to the user, not the gateway

This pull request and its description were written by Isaac.

Mirrors the "creating" treatment so transient states are visually
distinct from terminal ones in `list` and `status` output.

Co-authored-by: Isaac
Replace the bare warning "Starting <id>… (may take a few minutes)"
with an explicit api.start + waitForRunning sequence before exec'ing
ssh. The SSH gateway already auto-starts a stopped sandbox on connect,
but that path is opaque — ssh just hangs for minutes with no
progress — and races the cold-start timeout. Driving the start
ourselves gives the user a visible spinner with elapsed time and a
deterministic timeout (the same one `databricks lakebox start` uses).

The new ensureRunning helper also handles already-transitioning
states (Creating, Starting): it skips the start RPC and just polls.

Co-authored-by: Isaac
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Waiting for approval

Based on git history, these people are best suited to review:

  • @pietern -- recent work in cmd/lakebox/

Eligible reviewers: @andrewnester, @anton-107, @denik, @renaudhartert-db, @shreyas-goenka, @simonfaltum

Suggestions based on git history. See OWNERS for ownership rules.

@akshaysingla-db akshaysingla-db merged commit 41d1215 into databricks:demo-lakebox Jun 4, 2026
10 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant