Skip to content

Add clustered migration sync for shared disks (SYNCING barrier)#403

Open
fabi200123 wants to merge 1 commit into
cloudbase:masterfrom
fabi200123:cor-707
Open

Add clustered migration sync for shared disks (SYNCING barrier)#403
fabi200123 wants to merge 1 commit into
cloudbase:masterfrom
fabi200123:cor-707

Conversation

@fabi200123

Copy link
Copy Markdown
Contributor

Introduce TASK_STATUS_SYNCING and TASK_TYPES_REQUIRING_CLUSTER_SYNC (DEPLOY_TRANSFER_DISKS, SHUTDOWN_INSTANCE) so multi-instance transfers with base_transfer_action.clustered=True wait for all peer tasks before marking COMPLETED and advancing dependents.

  • Add clustered boolean on base_transfer_action (DB migration 024)
  • Plumb clustered through create_instances_transfer, REST transfers API, deployment creation (inherits from parent transfer)
  • On task_completed: set SYNCING when barrier applies; when all peers are SYNCING, finalize (for deploy: dedupe volumes_info by disk_id, leader gets replicate_disk_data=True, followers False)
  • ReplicateDisksTask: skip provider replicate_disks for volumes with replicate_disk_data=False and merge back in export disk order
  • On set_task_error: abort peer tasks stuck in SYNCING for the same type

Volumes schema already allows extra properties; replicate_disk_data is consumed by replication only (default True preserves behavior).

Comment thread coriolis/tasks/replica_tasks.py Outdated
Comment thread coriolis/conductor/rpc/client.py Outdated
Comment thread coriolis/conductor/rpc/server.py Outdated
Comment thread coriolis/conductor/rpc/server.py Outdated
Comment thread coriolis/conductor/rpc/server.py Outdated
Comment thread coriolis/conductor/rpc/server.py Outdated
Comment thread coriolis/conductor/rpc/server.py Outdated
@fabi200123 fabi200123 force-pushed the cor-707 branch 4 times, most recently from 9813d2b to 61ebc94 Compare April 9, 2026 01:21
Comment thread coriolis/conductor/rpc/server.py
Comment thread coriolis/utils.py
Comment thread coriolis/conductor/rpc/server.py Outdated
Comment thread coriolis/conductor/rpc/server.py Outdated
Comment thread coriolis/schemas/disk_sync_resources_info_schema.json Outdated
Comment thread coriolis/tasks/replica_tasks.py Outdated
@fabi200123 fabi200123 force-pushed the cor-707 branch 3 times, most recently from 173c46b to 6433f4e Compare May 13, 2026 13:12
Comment thread coriolis/conductor/rpc/server.py Outdated
Comment thread coriolis/conductor/rpc/server.py Outdated
Comment thread coriolis/conductor/rpc/server.py Outdated
Comment thread coriolis/conductor/rpc/server.py Outdated
Comment thread coriolis/conductor/rpc/server.py Outdated
Comment thread coriolis/conductor/rpc/server.py Outdated
Comment thread coriolis/conductor/rpc/server.py
Comment thread coriolis/tasks/replica_tasks.py Outdated
Comment thread coriolis/tasks/replica_tasks.py Outdated
Comment thread coriolis/utils.py Outdated
Introduce TASK_STATUS_SYNCING and TASK_TYPES_TO_SYNC (GET_INSTANCE_INFO,
DEPLOY_TRANSFER_DISKS, REPLICATE_DISKS) so multi-instance transfers with
base_transfer_action.clustered=True wait for all peer tasks of the same
type before leaving SYNCING for COMPLETED and advancing dependents.

- clustered is set as len(instances) > 1 on transfer create
- On task_completed: enter SYNCING when the barrier applies, then when
  every peer is SYNCING, run sync hooks (GET_INSTANCE_INFO:
  promote shareable on export disks, DEPLOY_TRANSFER_DISKS:
  shared-disk volumes_info, REPLICATE_DISKS: sync change_id)
- ReplicateDisksTask: skip provider replicate for replicate_disk_data=False
- On task error: abort peers stuck in SYNCING for the same task type
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants