Skip to content

Refining Sandbox Status for Lifecycle Management #525

@SHRUTI6991

Description

@SHRUTI6991

Current Behavior

Currently, Sandbox.Status relies primarily on a single Ready condition.

  • When the Sandbox is Active, Ready is True.

  • When the Sandbox is Suspended (replicas set to 0), the underlying Pod is deleted.

  • In this state, the Ready condition typically defaults to False or becomes stale. There is no explicit machine-readable field to distinguish between a "Suspended" state (intended) and a "Failed/Pending" state (unintended).

The Problem

The lack of explicit status causes two primary issues:

  • Client-Side Ambiguity: A developer building a UI or CLI for Sandbox cannot tell if the sandbox is currently being "cleaned up," is "voluntarily suspended," or is "failing to start." They are forced to write complex logic to check the spec.replicas and cross-reference it with the absence of a Pod. For eg: when using methods like suspend and resume.

  • Loss of Intent: Since the "Suspended" state results in the deletion of the Pod, the Sandbox object loses its ability to report its own state. The status should reflect the state of the Sandbox resource, not just a proxy for the Pod's health.

The Proposed Change

Instead of just a binary Ready condition, we introduce a structured status approach (similar to Pod Conditions or Deployment Status) that explicitly captures the lifecycle intent.

Sandbox State Pod Presence Ready Condition Suspended Condition Ready Reason (Example)
Active Exists / Running True False SandboxPodReady
Suspending Terminating False True SandboxPodScalingDown
Suspended Absent False True SandboxPodDeleted
Resuming Creating False False SandboxPodInitializing
Failed Crashing False False SandboxPodNotReady

Alternatives Considered

Option: Adding a new Reason to the existing Ready condition
One alternative is to keep the single Ready condition and simply update the Reason field to something like SandboxSuspended when the replicas are 0.

Why this was rejected:

  • Semantic Overloading: In Kubernetes, Ready is typically a binary indicator of whether a resource is currently operable. Using it to represent a desired state (Suspension) makes the status ambiguous for automated controllers.

  • Machine Consumption (The "Regex" Problem): Clients would have to parse the Reason string to determine the actual state of the Sandbox. Conditions are designed so that the Type and Status provide the high-level signal, while Reason provides the "why."

  • Loss of State History: If we only use one condition, we lose the ability to see multiple truths simultaneously (e.g., a Sandbox can be "Not Ready" because it is "In Progress" AND "Suspended"). Separate conditions allow for a much richer status history in the lastTransitionTime field.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions