Skip to content

[ECS-Plugin]: Implement Primary Rollout and Canary Rollout Stage#6587

Merged
khanhtc1202 merged 4 commits into
pipe-cd:masterfrom
armistcxy:ecs-plugin/rollout
Mar 23, 2026
Merged

[ECS-Plugin]: Implement Primary Rollout and Canary Rollout Stage#6587
khanhtc1202 merged 4 commits into
pipe-cd:masterfrom
armistcxy:ecs-plugin/rollout

Conversation

@armistcxy
Copy link
Copy Markdown
Contributor

@armistcxy armistcxy commented Mar 13, 2026

What this PR does: Implement Primary Rollout and Canary Rollout Stage

Why we need it:

Primary Rollout Stage

sequenceDiagram
        participant Plugin as ECSPrimaryRolloutStage
        participant Provider as ECS Client
        participant ECS as AWS ECS

        Plugin->>Plugin: Load AppConfig
        Plugin->>Plugin: Load TaskDefinition
        Plugin->>Plugin: Load ServiceDefinition
        Plugin->>Provider: Create ECS client

        alt AccessType == ELB
            Plugin->>Plugin: Load primary TargetGroup
        end

        Plugin->>Provider: RegisterTaskDefinition
        Provider->>ECS: RegisterTaskDefinition
        ECS-->>Provider: TaskDefinition

        Plugin->>Provider: ApplyServiceDefinition
        Provider->>ECS: Create/Update Service
        ECS-->>Provider: Service

        Plugin->>Provider: GetPrimaryTaskSet
        Provider->>ECS: DescribeTaskSets
        ECS-->>Provider: Current Primary TaskSet (or nil)

        Plugin->>Provider: CreateTaskSet (100% scale)
        Provider->>ECS: CreateTaskSet
        ECS-->>Provider: New TaskSet

        Plugin->>Provider: Promote TaskSet to PRIMARY
        Provider->>ECS: UpdateServicePrimaryTaskSet

        alt Previous primary taskset exists
            Plugin->>Provider: DeleteTaskSet (old primary)
            Provider->>ECS: DeleteTaskSet
        end

        Plugin->>Provider: WaitServiceStable
        Provider->>ECS: DescribeServices (poll)
        ECS-->>Provider: Stable

        Plugin-->>Plugin: StageStatusSuccess

Loading

Canary Rollout Stage

    sequenceDiagram
        participant Plugin as ECSCanaryRolloutStage
        participant Provider as ECS Client
        participant ECS as AWS ECS
        participant Meta as Metadata Store

        Plugin->>Plugin: Load AppConfig
        Plugin->>Plugin: Parse StageOptions (scale)
        Plugin->>Provider: Create ECS client

        Plugin->>Plugin: Load TaskDefinition
        Plugin->>Plugin: Load ServiceDefinition

        alt AccessType == ELB
            Plugin->>Plugin: Load TargetGroups
            alt Canary target group missing
                Plugin->>Plugin: Log error
                Plugin-->>Plugin: StageStatusFailure
            end
        end

        Plugin->>Provider: RegisterTaskDefinition
        Provider->>ECS: RegisterTaskDefinition
        ECS-->>Provider: TaskDefinition

        Plugin->>Provider: ApplyServiceDefinition
        Provider->>ECS: Create/Update Service
        ECS-->>Provider: Service

        Plugin->>Provider: CreateTaskSet (canary LB, scale%)
        Provider->>ECS: CreateTaskSet
        ECS-->>Provider: New TaskSet

        Plugin->>Provider: WaitServiceStable
        Provider->>ECS: DescribeServices (poll)
        ECS-->>Provider: Stable

        Plugin->>Meta: PutDeploymentPluginMetadata (canary task set)

        Plugin-->>Plugin: StageStatusSuccess
Loading

Which issue(s) this PR fixes: Part of #6443

Fixes #

Does this PR introduce a user-facing change?:

  • How are users affected by this change:
  • Is this breaking change:
  • How to migrate (if breaking change):

@armistcxy
Copy link
Copy Markdown
Contributor Author

The function createPrimaryTaskSet should focus only in create new taskset and mark it as PRIMARY. Deleting old tasksets is out of scope and have nothing to do with this function
The following code section should be move to another place instead of inside createPrimaryTaskSet

  // Get current PRIMARY, Active
	lp.Infof("Getting current active task sets for the service %s", *service.ServiceName)
	prevTaskSets, err := client.GetServiceTaskSets(ctx, service)
	if err != nil {
		return fmt.Errorf("failed to get service task sets: %w", err)
	}
       .... 
	lp.Infof("Deleting old task sets for service %s", *service.ServiceName)
	for _, prevTaskSet := range prevTaskSets {
		if err = client.DeleteTaskSet(ctx, prevTaskSet); err != nil {
			return fmt.Errorf("failed to delete old task set %s: %w", *prevTaskSet.TaskSetArn, err)
		}
	}

The stage will handle whether to delete old tasksets or not with stage option

@armistcxy
Copy link
Copy Markdown
Contributor Author

armistcxy commented Mar 19, 2026

After discussing with @khanhtc1202,. we both think that ECS_PRIMARY_ROLLOUT stage should only rollout the primary taskset which contains 3 actions:

  1. Create new taskset
  2. Promote new taskset to primary taskset
  3. Delete the old primary taskset

The old implementation in v0 of ECS_PRIMARY_ROLLOUT stage uses the same function createPrimaryTaskSet like the ECS_SYNC stage

if in.StageConfig.Name == model.StageECSPrimaryRollout {
// Create PRIMARY task set in case of Primary rollout.
if err := createPrimaryTaskSet(ctx, client, *service, *td, targetGroup); err != nil {
in.LogPersister.Errorf("Failed to roll out ECS task set for service %s: %v", *serviceDefinition.ServiceName, err)
return false
}

And this function hides a critical action: delete the old tasksets (which suppose is fine for sync stage but this is dangerous when used inside primary rollout stage because it's not the intend of primary rollout)

// Remove old taskSets if existed.
// HACK: All old task sets including canary are deleted here.
// However, we need to discuss whether we should delete the canary here or in later stage(CanaryClean).
for _, prevTaskSet := range prevTaskSets {
if err = client.DeleteTaskSet(ctx, *prevTaskSet); err != nil {
return err
}
}
return nil

@armistcxy armistcxy force-pushed the ecs-plugin/rollout branch 2 times, most recently from b1753bd to f9b84a5 Compare March 20, 2026 04:15
Signed-off-by: Hoang Ngo <adlehoang118@gmail.com>
Signed-off-by: Hoang Ngo <adlehoang118@gmail.com>
Signed-off-by: Hoang Ngo <adlehoang118@gmail.com>
… tasksets

Signed-off-by: Hoang Ngo <adlehoang118@gmail.com>
@armistcxy armistcxy force-pushed the ecs-plugin/rollout branch from f9b84a5 to 69fdce0 Compare March 20, 2026 04:35
Copy link
Copy Markdown
Member

@khanhtc1202 khanhtc1202 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@khanhtc1202 khanhtc1202 merged commit 223a972 into pipe-cd:master Mar 23, 2026
45 checks passed
@github-actions github-actions Bot mentioned this pull request May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants