Skip to content

Add disk update action via CPI#2701

Open
neddp wants to merge 7 commits intomainfrom
add_disk_update_via_cpi
Open

Add disk update action via CPI#2701
neddp wants to merge 7 commits intomainfrom
add_disk_update_via_cpi

Conversation

@neddp
Copy link
Copy Markdown
Member

@neddp neddp commented Mar 26, 2026

What is this change about?

This PR adds the ability to update persistent disk properties (size, type, cloud_properties) via the CPI while disks are already in a detached state during VM recreation. This avoids a redundant detach→update→attach cycle when the disk is already detached as part of the recreation flow.

Changes:

  • New update_detached_disks method in DiskManager: Called by RecreateHandler after disks are detached but before the old VM is deleted. Attempts a CPI-level update_disk while the disk is already detached. If the CPI doesn't support update_disk, it gracefully falls back - the subsequent update_persistent_disk call handles disk changes via the standard copy path after recreation.

  • Pass disk_manager into RecreateHandler: Previously, RecreateHandler only handled VM lifecycle (detach disks → delete VM → create VM → attach disks) using stateless Step classes. It had no access to DiskManager because it didn't need it. Now it receives disk_manager as an optional keyword argument to enable the new detached-disk update path.

  • Handle new disk CID from update_disk return value: The CPI's update_disk may return a new disk CID (e.g., GCP replaces the disk via snapshot when changing disk types). Both update_detached_disks and update_disk_cpi now capture and persist the returned CID. Previously the return value was discarded.

  • Staleness guard in update_disk_cpi: After update_detached_disks succeeds and updates the DB model, the subsequent update_persistent_disk call would redundantly invoke the CPI again (especially when recreate_persistent_disks is set). A new guard skips the CPI call when the disk model already matches the desired size and cloud_properties.

  • Consolidate DB updates: Merged separate update(size:) and update(cloud_properties:) calls into a single update(updates) call, including the optional disk_cid when it changes. Reduces DB roundtrips from 2-3 to 1.

  • Nil guard for active_vm: Added a defensive check in update_detached_disks in case the instance has no active VM during recreation.

Please provide contextual information.

This is part of the GCP migration from N2 to N4 machine types we are working on. Since the disk_type must be changed, and in-place change is not supported, a CPI change was also added:
cloudfoundry/bosh-google-cpi-release#382

What tests have you run against this PR?

  • Unit tests
  • Tested in a dev deployment

How should this change be described in bosh release notes?

When enable_cpi_update_disk is enabled, the Director now updates persistent disk properties (size, type) via the CPI during VM recreation while the disk is already detached, avoiding a redundant detach-update-attach cycle. The CPI's update_disk may return a new disk CID (e.g., when the disk is replaced), which is now correctly persisted. If the CPI does not support update_disk, the Director falls back to the standard disk copy path transparently.

Does this PR introduce a breaking change?

No. All changes are behind the existing enable_cpi_update_disk feature flag (default: false). The RecreateHandler constructor change uses a keyword argument with a default of nil, so existing callers are unaffected. When the feature is disabled, behavior is identical to before this PR.


Claude was used for the code.

@neddp neddp changed the title Add disk update via cpi Add disk update option via CPI Mar 26, 2026
@neddp neddp changed the title Add disk update option via CPI Add disk update action via CPI Mar 26, 2026
rkoster
rkoster previously approved these changes Mar 26, 2026
@rkoster
Copy link
Copy Markdown
Contributor

rkoster commented Mar 27, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5f1b3629-1ce2-4922-91f6-29564d463ee5

📥 Commits

Reviewing files that changed from the base of the PR and between 66f6142 and ff549fb.

📒 Files selected for processing (1)
  • src/bosh-director/spec/unit/bosh/director/disk_manager_spec.rb

📝 Walkthrough

Walkthrough

Adds a new DiskManager flow to update detached persistent disks via CPI during VM recreation, integrates it into RecreateHandler and UpdateProcedure, consolidates DiskManager model updates and CID handling, and adds comprehensive unit tests for the new behaviors and edge cases.

Changes

Cohort / File(s) Summary
Disk Manager Core
src/bosh-director/lib/bosh/director/disk_manager.rb
Added update_detached_disks(instance_plan) to run CPI update_disk for detached persistent disks when feature-flagged and changed. Enhanced update_disk_cpi to short-circuit when no change, interpolate cloud properties with versioning, capture CPI-returned disk_cid, and consolidate model updates (size, cloud_properties, optional new disk_cid) into a single update before reattach.
Recreate Handler Integration
src/bosh-director/lib/bosh/director/instance_updater/recreate_handler.rb
initialize gained optional disk_manager: keyword; perform now calls a new private update_detached_disks immediately after detach_disks, which delegates to disk_manager.update_detached_disks(instance_plan) when provided.
Update Procedure Coordination
src/bosh-director/lib/bosh/director/instance_updater/update_procedure.rb
When recreate is true, converge_vm now constructs RecreateHandler with disk_manager: @disk_manager`` (and uses an explicit if recreate block for clarity).
Disk Manager Test Coverage
src/bosh-director/spec/unit/bosh/director/disk_manager_spec.rb
Added tests for update_detached_disks and extended update_persistent_disk tests: verify CPI invocation/arguments, persisted updates for size/cloud_properties, handling when CPI returns new/same/nil CID, behavior when CPI raises NotImplemented/NotSupported, feature-flag no-op, skip when no active VM, and early-exit when disk already matches desired state.
Update Procedure Test Coverage
src/bosh-director/spec/unit/bosh/director/instance_updater/update_procedure_spec.rb
Updated test doubles to include unresponsive_agent?: false and added expectation that RecreateHandler is constructed with disk_manager: disk_manager in the recreate scenario.

Sequence Diagram

sequenceDiagram
    participant UP as UpdateProcedure
    participant RH as RecreateHandler
    participant DM as DiskManager
    participant CPI as Cloud (CPI)
    participant DB as Database

    UP->>RH: new(..., disk_manager)
    UP->>RH: perform()
    RH->>RH: detach_disks()
    alt disk_manager present
        RH->>DM: update_detached_disks(instance_plan)
        alt enable_cpi_update_disk && persistent_disk_changed?
            DM->>DB: load detached persistent disk models
            DM->>CPI: update_disk(disk_cid, size, cloud_properties)
            CPI-->>DM: disk_cid (new / same / nil)
            alt disk_cid changed
                DM->>DB: update model(size, cloud_properties, disk_cid)
            else
                DM->>DB: update model(size, cloud_properties)
            end
        else
            DM->>DM: no-op (flag/unchanged)
        end
    else
        RH->>RH: skip disk CPI updates
    end
    RH->>RH: delete_vm() / create_vm()
    RH->>RH: attach_disks()
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through detached disks with nimble paws,
CPI hummed softly as I checked all the laws,
Sizes and props lined up, some CIDs changed hue,
Models updated tidy, no reattach to rue,
A rabbit’s small cheer for safe disk renew.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add disk update action via CPI' directly and concisely summarizes the main change—adding CPI-based disk update functionality during VM recreation.
Description check ✅ Passed The description comprehensively covers all required template sections: change rationale, contextual links, testing approach, release notes, breaking change assessment, and team tags.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch add_disk_update_via_cpi

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bosh-director/lib/bosh/director/disk_manager.rb`:
- Around line 85-90: The update path is passing new_disk.cloud_properties
directly to the CPI (see update_disk call), causing unresolved ((...))
placeholders; fix update_disk_cpi to interpolate new_disk.cloud_properties the
same way the create path does before calling cloud.update_disk. Locate
update_disk_cpi (and the update_disk call that uses new_disk.cloud_properties),
apply the same interpolation helper used by create_disk to produce resolved
cloud_properties, and pass the resolved properties to cloud.update_disk and to
the updates hash.
- Around line 309-312: The stale-state guard in DiskManager that returns early
when old_disk_model.size == new_disk.size && old_disk_model.cloud_properties ==
new_disk.cloud_properties skips CPI updates when only variable-interpolated
values changed; change the check to use the variable-set-aware comparison
already produced for this flow (use changed_disk_pairs or the resolved
cloud_properties for old vs new) rather than comparing raw
new_disk.cloud_properties, so that interpolation-only changes trigger the CPI
update for disk_cid (referencing old_disk_model, new_disk, changed_disk_pairs,
and disk_cid in your patch).
- Around line 327-335: The attach happens before persisting a new disk CID,
causing the VM to be reattached with a stale CID; move or apply the updates
(i.e., persist new_disk_cid and any other changes) to old_disk_model before
calling attach_disk, or explicitly set old_disk_model.disk_cid to new_disk_cid
prior to attach_disk(old_disk_model, instance_plan.tags); update the logic
around new_disk_cid, attach_disk, and old_disk_model.update to ensure the model
reflects the returned CID at attach time, and extend the new-CID spec to assert
the attach step uses the returned CID.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 04b9488c-58f8-48c7-997d-1e59f987c776

📥 Commits

Reviewing files that changed from the base of the PR and between 1590d53 and fd28702.

📒 Files selected for processing (5)
  • src/bosh-director/lib/bosh/director/disk_manager.rb
  • src/bosh-director/lib/bosh/director/instance_updater/recreate_handler.rb
  • src/bosh-director/lib/bosh/director/instance_updater/update_procedure.rb
  • src/bosh-director/spec/unit/bosh/director/disk_manager_spec.rb
  • src/bosh-director/spec/unit/bosh/director/instance_updater/update_procedure_spec.rb

@github-project-automation github-project-automation bot moved this from Pending Merge | Prioritized to Waiting for Changes | Open for Contribution in Foundational Infrastructure Working Group Mar 27, 2026
neddp added 2 commits March 27, 2026 12:00
Both update_detached_disks and update_disk_cpi were passing raw
cloud_properties to the CPI, which could contain unresolved variable
placeholders like ((...)).  The create_disk path already resolves
these via variables_interpolator.interpolate_with_versioning; do the
same for the update paths.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bosh-director/spec/unit/bosh/director/disk_manager_spec.rb`:
- Around line 906-926: Add a unit test for the defensive branch where a disk has
no active VM: in the spec for disk_manager.update_detached_disks, create a
context "when there is no active VM for the disk", remove the active VM from
instance_model (e.g. call instance_model.active_vm.destroy and reload), then
call disk_manager.update_detached_disks(instance_plan) and assert that
per_spec_logger receives a warn matching /No active VM for disk/,
cloud.update_disk is not called, and the Models::PersistentDisk record (disk_cid
'disk123') still has the original size (e.g. 2048) to ensure the update is
skipped and state unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cef0edd0-50d9-4cd9-9fe7-38ffcca4d6a1

📥 Commits

Reviewing files that changed from the base of the PR and between fd28702 and 66f6142.

📒 Files selected for processing (2)
  • src/bosh-director/lib/bosh/director/disk_manager.rb
  • src/bosh-director/spec/unit/bosh/director/disk_manager_spec.rb

@github-project-automation github-project-automation bot moved this from Waiting for Changes | Open for Contribution to Pending Merge | Prioritized in Foundational Infrastructure Working Group Mar 27, 2026
@s4heid
Copy link
Copy Markdown
Contributor

s4heid commented Mar 27, 2026

@neddp perhaps, I am missing some context here, but as far as I can see we had the same or at least a similar requirement for Azure some time ago and added native disk update support via bosh parameter enable_cpi_update_disk. CPI needs to support this by implementing the update_disk method.
So, in theory you could just implement this method in the gcp cpi. Here is the PR which does that for the azure cpi.

@neddp
Copy link
Copy Markdown
Member Author

neddp commented Mar 27, 2026

Hi @s4heid,

This builds on top of your initial implementation, but the existing Azure functionality works for in-place updates only and was not returning anything. In this particular GCP scenario, an in-place is not possible. (Similar discussion was started for Azure as well - cloudfoundry/bosh-azure-cpi-release#699)

This PR, along with the GCP CPI changes, do a disk snapshot and then a recreate with a different type. The CPI update_disk then returns the new CID and the director (with the current changes) updates it in the DB so it can be reattached.

Please correct me if I'm wrong.

@s4heid
Copy link
Copy Markdown
Contributor

s4heid commented Mar 27, 2026

Got it - this makes sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Pending Merge | Prioritized

Development

Successfully merging this pull request may close these issues.

4 participants