LPX-578: sync:fargate command for ECS + ALB adoption by stevethomas · Pull Request #25 · codinglabsau/yolo

stevethomas · 2026-05-19T07:28:19Z

What this ships

sync:compute — the full provisioning surface for Fargate-shaped YOLO apps.

Tracking: LPX-578.

TL;DR

yolo init
yolo sync production       # provisions VPC/storage/standalone + compute (this PR)
yolo build production      # PR #26
yolo deploy production     # PR #26

SyncCommand only enrols SyncComputeCommand when the manifest has tasks.web — non-Fargate consumers stay unchanged.

Steps in `sync:compute` (in order)

SyncEcrRepositoryStep — creates ECR repo + attaches a lifecycle policy. Defaults: expire untagged after 7d, keep last 30 tagged. Overridable via tasks.web.image-retention.{keep-count,untagged-days}.
SyncEcsClusterStep — Fargate + FARGATE_SPOT capacity providers, container insights enabled.
SyncTaskSecurityGroupStep — task ENI security group with inbound from the ALB SG only.
SyncLoadBalancerStep — internet-facing ALB on the public subnets, looked up by aws.alb or defaulted to the env-shared name yolo-{env}.
SyncTargetGroupStep — IP-target group with configurable health check.
SyncHttpListenerStep — HTTP:80 → HTTPS:443 permanent redirect.
SyncHttpsListenerStep — HTTPS:443 with ACM cert attached. Returns SKIPPED cleanly when the cert isn't ISSUED yet (re-sync after DNS validation completes) or when no apex/domain is set.
SyncListenerRuleStep — host-based routing rule with bounded priority assignment in the 1000-49999 range. Pure nextAvailablePriority() extracted and unit-tested for wrap, exhaustion, determinism, range.
SyncTaskLogGroupStep — pre-creates the CloudWatch log group with configurable retention (tasks.web.log-retention, default 30d).
SyncTaskDefinitionStep — Fargate-compatible task definition (awsvpc, configurable cpu/memory).
SyncEcsServiceStep — ECS service with rolling deployment config (min 100% / max 200%). Drift detection covers desiredCount + healthCheckGracePeriodSeconds. Task-definition revision adoption is owned by yolo deploy, not sync.

Manifest shape

environments:
  production:
    domain: codinglabs.com.au
    aws:
      account-id: 848509375702
      region: ap-southeast-2
      # alb: my-shared-alb   # optional, defaults to per-env shared (yolo-{env})
    tasks:
      web:
        port: 8000
        cpu: '512'
        memory: '1024'
        desired-count: 1
        # all optional, defaults shown:
        # dockerfile: Dockerfile
        # platform: linux/amd64
        # execution-role: ecsTaskExecutionRole
        # task-role: ~                              # omit unless app needs SES/SQS/S3 etc.
        # log-retention: 30
        # image-retention:
        #   keep-count: 30
        #   untagged-days: 7
        # health-check:
        #   path: /health
        #   grace-period: 60
        # enable-execute-command: false

Manifest value validation

Three values flow into AWS API payloads through strict validators that throw IntegrityCheckException with a clear message on bad input, before any AWS round trip:

tasks.web.enable-execute-command — Helpers::validateStrictBool. The YAML-quoted string "false" parses as false (PHP's native (bool) cast would return true).
tasks.web.image-retention.{keep-count,untagged-days} — Helpers::validatePositiveInt. Non-numeric strings rejected (not coerced to 0).
tasks.web.log-retention — Helpers::validateCloudWatchLogRetention. Restricted to CloudWatch's accepted enum (1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, 2192, 2557, 2922, 3288, 3653).

Other AWS plumbing

Aws::ecr() / Aws::ecs() SDK client registrations.
Aws::ecsTags() helper for the lowercase key/value tag shape ECS expects.
UsesEcr / UsesEcs concerns mixed into AwsResources.
publicSubnetIds() on UsesEc2.
New SecurityGroup::ECS_TASK_SECURITY_GROUP + SecurityGroupRule::ECS_TASK_LB_INGRESS_RULE enum cases.
UsesElasticLoadBalancingV2::targetGroup() uses per-app naming (yolo-{env}-{app}) so multiple apps can share one ALB with separate TGs.
UsesEc2::securityGroups() is intentionally un-memoised — steps frequently describe → catch-not-found → create → re-describe in a single sync and a stale cache silently breaks the re-read.
UsesEcs::ecsService() and ecsTaskDefinition() translate any AwsException (incl. ClusterNotFoundException on cold accounts) into the project's standard ResourceDoesNotExistException so the calling step's catch handles dry-run + create cleanly.

Known follow-ups (separate tickets)

LPX-596 — strip the remaining static caches from UsesEc2/UsesEcs/UsesEcr/UsesElasticLoadBalancingV2.
LPX-597 — ALB attribute polish (deletion protection, access logs, hardened defaults). Land before production cutover.
LPX-598 — deploy hardening polish (cross-env safety, revision pinning, log surfacing, circuit breaker, …). Surfaced by the PR LPX-576 + LPX-579: build + deploy commands for first Fargate POC #26 re-review; bundled there since most items touch the deploy path.
IAM provisioning for ecsTaskExecutionRole + per-app task roles.
destroy:compute teardown command.
Stub + README pass to reflect the Fargate manifest shape.

Reviewer notes

IAM is out-of-scope. tasks.web.execution-role defaults to the bare 'ecsTaskExecutionRole' role name; the AWS-managed role must exist in the target account or registerTaskDefinition fails.
ALB stays in provisioning for ~2-5 min after creation. AWS accepts listener creates against a provisioning ALB, so sync proceeds — but sync:compute returns before the ALB is actually serving traffic.
ECR repo is empty after first sync; the task-def's default :latest reference doesn't resolve until yolo build has pushed an image. PR LPX-576 + LPX-579: build + deploy commands for first Fargate POC #26 ships the build + deploy half.
91 pest, 0 phpstan, pint clean.

Adds the sync:fargate command and steps for provisioning the Fargate compute stack: ECR repo, ECS cluster, task SG, ALB (lookup-or-create by aws.alb manifest key), target group, HTTP/HTTPS listeners, host based listener rule, task definition, and ECS service. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

stevethomas

Review Summary

Verdict: ❌ Changes Requested

Two blockers, six warnings worth landing fixes for, several notes. The blockers both hit on first run for a current dev-main consumer (CL marketing site is on dev-main right now), so they need to land before this merges.

Risk Assessment

Category	Status
Data Integrity	❌ Blocker
Security	⚠️ Warnings
Breaking Changes	❌ Blocker
Operational	⚠️ Warnings
Test Coverage	⚠️ Warnings

Blockers

src/Steps/Fargate/SyncTaskSecurityGroupStep.php:64 — first-run create path is broken by a stale-cache bug. UsesEc2::securityGroups() memoises into static::$securityGroups on the failed lookup at line 23. After createSecurityGroup succeeds, the re-fetch at line 64 hits the cached pre-create list and throws ResourceDoesNotExistException. Verified by tracing UsesEc2::securityGroupByName() → UsesEc2::securityGroups() — no refresh between the two calls. Same pattern exists in the alpha-era SyncEc2SecurityGroupStep (out of scope for this PR but worth noting).
src/Commands/SyncCommand.php:39 — SyncFargateCommand is enrolled in the top-level sync orchestration with no manifest gate. For any current dev-main consumer with a non-Fargate manifest (the CL marketing site yolo.yml is one), running yolo sync production now silently provisions a new ALB (~AU$24/mo), an ECS cluster (containerInsights enabled — ongoing CloudWatch cost), an ECR repo, and a task SG. Conflicts with Steve's standing rule that infra commands need explicit confirmation. Gate with Manifest::has('tasks.web') or have each Fargate step short-circuit to SKIPPED when tasks.web is absent.

Warnings (inline)

See inline comments on:

SyncListenerRuleStep.php:96 — priority collision walk escapes the 1000-49999 floor
SyncEcrRepositoryStep.php:25 — no lifecycle policy, images accumulate forever
SyncTaskDefinitionStep.php:60 — log group auto-created with no retention policy
SyncEcsServiceStep.php:38 — narrow drift detection; service never adopts new task def revisions
UsesElasticLoadBalancingV2.php:42 — TG name lookup change, ELBv2's 32-char Name cap unvalidated
SyncHttpsListenerStep.php:18 — Manifest::apex() TypeErrors on domain-less manifests (only caught by ResourceDoesNotExistException)

Notes (no inline comment)

Existence-only checks across the Fargate suite — SyncEcsClusterStep, SyncTargetGroupStep, SyncEcsServiceStep (mostly) all return SYNCED when a resource exists, regardless of manifest drift. Health-check path, capacity providers, network config, deployment config — all silently masked. Pattern issue, worth a follow-up that establishes a consistent drift convention.
Task definition revisions accumulate — SyncTaskDefinitionStep re-registers a fresh revision on every sync with no payload diff. Combined with the service-not-adopting issue above, every sync churns a new revision that the service ignores.
Public subnet placement — Fargate tasks land in public subnets with assignPublicIp: ENABLED. Costs ~AU$5.40/mo per task in IPv4 fees and diverges from standard Fargate posture. Acceptable for the alpha (no private subnets in sync:network yet) but call it out in docs.
ECR imageTagMutability: MUTABLE — fine for :latest rolling but precludes the digest-pin / hardened-deploy path; flip to IMMUTABLE when the deploy command lands if it's digest-pinned.
Cert status fall-through — SyncHttpsListenerStep:23 skips on any status !== 'ISSUED'. Correct behaviour but EXPIRED/REVOKED/FAILED get the same SKIPPED signal as PENDING_VALIDATION, masking real cert problems during sync.
No teardown path — no destroy:fargate (or equivalent) ships with this PR. Half-failed syncs leave an awful manual-cleanup checklist (cluster → service → task def revisions → listener rule → listeners → TG → ALB → task SG → ECR repo → log group). File the follow-up.
Stub + README still alpha-shaped — stubs/yolo.yml.stub is EC2/ASG-shaped with no aws.alb, aws.ecs.security-group, ecs.cluster, or tasks.web.* representation. README's one-liner promises sync works end-to-end but the stub doesn't reflect it. Either update the stub or add a docs/manifest.md.

Test coverage gaps

SyncListenerRuleStep::priority() + findRuleForHosts() — zero coverage. Highest-value missing tests given the multi-tenant priority-collision risk.
SyncEcsServiceStep update-vs-create branching — four behavioural paths, zero coverage. healthCheckGracePeriodSeconds ?? 0 coalesce will silently churn updateService calls on every sync against an existing service.
SyncTaskSecurityGroupStep tag-key drift — read filter and write tag are coupled by string match across the same file; a small test pinning the tag key in both places would catch any future drift.
Dry-run honesty across all 10 Fargate steps — no test confirms --dry-run actually avoids SDK writes. Trivial to add (bind mock client that throws on writes).

Naming

Separately raising with @stevethomas: sync:fargate ties the command name to the implementation when Fargate IS the implementation. sync:compute slots next to sync:network / sync:storage / sync:standalone and reclaims the v1-alpha vocabulary. Step namespace (src/Steps/Fargate/) stays Fargate-specific. Rename incoming as a follow-up commit on this branch.

🤖 Reviewed by Claude Code

stevethomas · 2026-05-19T11:08:41Z

+                ],
+            ]);
+
+            $securityGroup = AwsResources::ecsTaskSecurityGroup();


Finding: First-run create path will always throw ResourceDoesNotExistException here. UsesEc2::securityGroups() memoises into static::$securityGroups on the failed lookup at line 23 (the call inside the try block populates the cache with the pre-create list, then throws). After createSecurityGroup succeeds on line 52, the re-fetch at line 64 calls securityGroupByName() → securityGroups() again, hits the cached pre-create list, doesn't find the new SG, throws. The same pattern exists in the alpha-era SyncEc2SecurityGroupStep:110 (out of scope for this PR but worth a separate ticket).
_{Severity: BLOCKER}

Suggested Fix

Use the GroupId returned by createSecurityGroup directly instead of re-fetching via the cached lookup:

$result = Aws::ec2()->createSecurityGroup([...]); Aws::ec2()->authorizeSecurityGroupIngress( static::loadBalancerIngressRule(['GroupId' => $result['GroupId']], $containerPort) );

Or force a refresh by calling AwsResources::securityGroups(refresh: true) before the re-fetch. The SDK-response-direct approach is one fewer round trip.

Prompt

Review the code at src/Steps/Fargate/SyncTaskSecurityGroupStep.php:64 alongside src/Concerns/UsesEc2.php:100-110. A potential issue has been flagged: UsesEc2::securityGroups() memoises into a static property; the first call inside this step (line 23 via ecsTaskSecurityGroup → securityGroupByName) populates the cache with the pre-create SG list, then throws. After createSecurityGroup succeeds at line 52, the re-fetch at line 64 hits the stale cache and throws ResourceDoesNotExistException, breaking first-run sync. Verify whether this is real — if so, propose a fix using the createSecurityGroup response GroupId directly; if not, explain why.

stevethomas · 2026-05-19T11:08:41Z

                : [
                    SyncStandaloneCommand::class,
                ],
+            SyncFargateCommand::class,


Finding: SyncFargateCommand is enrolled in the top-level sync orchestration unconditionally, with no manifest guard. For any current dev-main consumer with a non-Fargate manifest (the CL marketing site yolo.yml matches this shape today), running yolo sync production will silently provision an ALB (~AU$24/mo), an ECS cluster with containerInsights enabled (CloudWatch cost), an ECR repo, and a task SG that they didn't ask for. Conflicts with Steve's standing rule that infra commands need explicit consent — currently it'd run on the first sync after upgrading their dev-main pin without any opt-in. Also: SyncHttpsListenerStep:18 and SyncListenerRuleStep:19 both call Manifest::apex() (return-typed : string) inside a try/catch that only catches ResourceDoesNotExistException, so a domain-less manifest crashes the whole sync run with an unhandled TypeError instead of cleanly skipping.
_{Severity: BLOCKER}

Suggested Fix

Gate the enrolment behind a manifest opt-in:

collect([ SyncNetworkCommand::class, SyncStorageCommand::class, ...Manifest::isMultitenanted() ? [...] : [SyncStandaloneCommand::class], ...Manifest::has('tasks.web') ? [SyncFargateCommand::class] : [], SyncIamCommand::class, SyncLoggingCommand::class, ])->each(...);

OR have each Fargate step return StepResult::SKIPPED when tasks.web is absent. The first form is cheaper and matches the existing multitenancy gating shape just above. Pairs naturally with the sync:compute rename — same gate, new command name.

Prompt

Review the code at src/Commands/SyncCommand.php:39. A potential issue has been flagged: SyncFargateCommand runs unconditionally as part of the top-level sync, so any current dev-main consumer with a non-Fargate manifest (e.g. CL marketing site) will silently get a new ALB, ECS cluster, ECR repo, and task SG provisioned without consent. Also crashes on domain-less manifests because SyncHttpsListenerStep/SyncListenerRuleStep call the return-typed Manifest::apex() inside a try/catch that only catches ResourceDoesNotExistException. Verify and propose either a manifest gate on enrolment or per-step SKIPPED short-circuits.

stevethomas · 2026-05-19T11:08:41Z

+        while (in_array($base, $usedPriorities, true)) {
+            $base = ($base % 50000) + 1;
+        }
+


Finding: The collision walk $base = ($base % 50000) + 1 can produce priorities below 1000 (the intended floor) and walk into priority slots typically reserved for manual high-priority overrides — base=49999 collides, next is 1, then 2, then 3, etc. Also: no bounded termination, so if all 50000 slots are taken the loop spins forever. Additionally describeRules is called twice per invocation (once in findRuleForHosts(), again here) — minor API pressure, but in multi-tenant fan-out it doubles ELB describe TPS pressure (10 TPS cap per region).
_{Severity: WARNING}

Suggested Fix

Stay bounded within the intended range and bail on exhaustion:

$base = (abs(crc32(Helpers::keyedResourceName(exclusive: true))) % 49000) + 1000; $attempts = 0; while (in_array($base, $usedPriorities, true)) { if (++$attempts > 49000) { throw new IntegrityCheckException('ALB listener rule priority space exhausted'); } $base = $base >= 49999 ? 1000 : $base + 1; }

Add a unit test pinning: priority always within 1000..49999, deterministic for a given app name when no collision, walks correctly when collisions occur.

Prompt

Review the code at src/Steps/Fargate/SyncListenerRuleStep.php:82-99. A potential issue has been flagged: (a) the collision walk `$base = ($base % 50000) + 1` escapes the intended 1000-49999 floor and can land on priority 1, conflicting with manually-set rules; (b) no bounded termination — exhausted space loops forever; (c) describeRules is called twice per invocation. Verify and propose a bounded walk + a unit test for priority() and findRuleForHosts() using a Container-bound mock elasticLoadBalancingV2 client.

stevethomas · 2026-05-19T11:08:41Z

+                return StepResult::WOULD_CREATE;
+            }
+
+            Aws::ecr()->createRepository([


Finding: ECR repository is created with no lifecycle policy. Every build will push a new tag (PR #26 pushes :{app-version} + :latest), and ECR keeps everything forever by default. Default per-repo image limit is 10k images (raisable but a hard ceiling), and storage is billed at US$0.10/GB-mo per image layer. For LP's multi-tenant footprint this gets noticeable fast.
_{Severity: WARNING}

Suggested Fix

Follow up the createRepository call with a putLifecyclePolicy setting a sensible default (configurable via manifest):

Aws::ecr()->putLifecyclePolicy([ 'repositoryName' => AwsResources::ecrRepositoryName(), 'lifecyclePolicyText' => json_encode([ 'rules' => [ [ 'rulePriority' => 1, 'description' => 'Expire untagged images after 7 days', 'selection' => ['tagStatus' => 'untagged', 'countType' => 'sinceImagePushed', 'countUnit' => 'days', 'countNumber' => 7], 'action' => ['type' => 'expire'], ], [ 'rulePriority' => 2, 'description' => 'Keep last 30 tagged images', 'selection' => ['tagStatus' => 'any', 'countType' => 'imageCountMoreThan', 'countNumber' => 30], 'action' => ['type' => 'expire'], ], ], ]), ]);

Prompt

Review the code at src/Steps/Fargate/SyncEcrRepositoryStep.php:19-30. A potential issue has been flagged: ECR repository is created with no lifecycle policy — images accumulate forever and eventually hit the soft cap of 10k per repo plus unbounded storage cost. Propose a default putLifecyclePolicy (expire untagged after 7d, keep last 30 tagged) overridable via manifest, e.g. tasks.web.image-retention.

stevethomas · 2026-05-19T11:08:41Z

+                            'awslogs-stream-prefix' => 'web',
+                            'awslogs-create-group' => 'true',
+                        ],
+                    ],


Finding: awslogs-create-group: 'true' lets the awslogs driver auto-create the log group on first task launch. The auto-created group has no retention policy (logs never expire), which means log storage costs grow linearly forever. Worse, the group ends up tagged/owned outside YOLO's normal sync convention — there's no AwsResources::logGroup() lookup pointing at it for future drift detection.
_{Severity: WARNING}

Suggested Fix

Pre-create the log group explicitly via a new SyncTaskLogGroupStep (added to SyncFargateCommand::$steps before SyncTaskDefinitionStep), with a configurable retention from tasks.web.log-retention (default e.g. 30 days). Then flip awslogs-create-group: 'false' here so the driver doesn't shadow-own the group.

Prompt

Review the code at src/Steps/Fargate/SyncTaskDefinitionStep.php:52-62. A potential issue has been flagged: awslogs-create-group: 'true' lets the awslogs driver auto-create the CloudWatch log group with infinite retention, growing log storage costs forever. Propose a new SyncTaskLogGroupStep that pre-creates the group with a configurable retention (default 30 days) and disables awslogs-create-group here.

stevethomas · 2026-05-19T11:08:41Z

+                'cluster' => AwsResources::ecsClusterName(),
+                'service' => AwsResources::ecsServiceName(),
+                'desiredCount' => $desiredCount,
+                'healthCheckGracePeriodSeconds' => $gracePeriod,


Finding: Drift detection is too narrow. Only desiredCount and healthCheckGracePeriodSeconds are diffed; everything else (taskDefinition revision, network config, capacity provider strategy, deployment config, execute-command toggle) is silently masked as SYNCED. Combined with SyncTaskDefinitionStep always registering a new revision per sync, this means: every sync churns a new task def revision that the service never adopts. Service stays pinned to whichever revision it was originally created with, accumulating an orphan trail of registered-but-unused revisions.

Also: the healthCheckGracePeriodSeconds ?? 0 coalesce will report drift on every sync when AWS returns a service without that field set (older create lineage) against a manifest default of 60 — silently churning updateService calls.
_{Severity: WARNING}

Suggested Fix

Decide whether sync or deploy owns task-def revision adoption. If deploy (matches PR #26), add an explicit comment here: // task definition revision is managed by deploy command, not sync. Either way, widen the drift detection — at minimum check taskDefinition ARN against the latest revision, and consider extracting a payloadHasDifferences helper similar to Helpers::payloadHasDifferences() already used in SyncEc2SecurityGroupStep.

For the coalesce churn: pull healthCheckGracePeriodSeconds default from the manifest only once and compare against ?? of the same default, not 0.

Prompt

Review the code at src/Steps/Fargate/SyncEcsServiceStep.php:20-41 alongside SyncTaskDefinitionStep.php. A potential issue has been flagged: drift detection only covers desiredCount + healthCheckGracePeriodSeconds. The service never adopts new task definition revisions registered by SyncTaskDefinitionStep, accumulating orphan revisions. The `healthCheckGracePeriodSeconds ?? 0` coalesce also reports drift on every sync against older services. Verify intent (does deploy own revision adoption?) and either widen drift detection or document the create-only contract.

stevethomas · 2026-05-19T11:08:41Z

        }

-        $targetGroups = Aws::elasticLoadBalancingV2()->describeTargetGroups();
+        $name = Helpers::keyedResourceName(exclusive: true);


Finding: Target group name lookup switched from Helpers::keyedResourceName(exclusive: false) (= yolo-{env}, shared across apps) to exclusive: true (= yolo-{env}-{app}, per-app). PR body confirms no current consumer relies on the old name. Two gotchas worth pinning:

ELBv2 Name cap is 32 chars. yolo-production- is 16 chars; that leaves 16 chars for the app name. Anything ≥17 chars hits AWS validation at create time with no upfront manifest precheck. Add validation in Manifest or in this concern.

No regression test pins the new name. A 5-line test asserting the Names => [...] arg shape passed to describeTargetGroups would document the intent and catch any future drift back to the old shared name.
_{Severity: WARNING}

Suggested Fix

Add a manifest precheck (somewhere in Command::execute() or as a new Steps/Ensures/EnsureResourceNameLengths step):

if (strlen(Helpers::keyedResourceName(exclusive: true)) > 32) { throw new IntegrityCheckException(sprintf( "Target group name 'yolo-%s-%s' exceeds ELB's 32-char limit. Shorten the app name to %d chars or less.", Helpers::environment(), Manifest::name(), 32 - strlen("yolo-" . Helpers::environment() . "-") )); }

And a regression test under tests/Unit/Concerns/UsesElasticLoadBalancingV2Test.php pinning the lookup name shape.

Prompt

Review the code at src/Concerns/UsesElasticLoadBalancingV2.php:36-63. A potential issue has been flagged: the lookup naming changed from exclusive:false to exclusive:true. ELBv2's 32-char Name cap means app names ≥17 chars will fail at create time with no upfront precheck. Add a manifest validation pass (early-fail with a clear message) plus a regression test pinning the Names arg shape passed to describeTargetGroups.

stevethomas · 2026-05-19T11:08:41Z

+    public function __invoke(array $options): StepResult
+    {
+        try {
+            $certificate = AwsResources::certificate(Manifest::apex());


Finding: Manifest::apex() has a : string return type and will TypeError (not throw ResourceDoesNotExistException) when neither apex nor domain is set in the manifest. The try/catch only catches ResourceDoesNotExistException, so a domain-less Fargate-shaped manifest blows up the whole sync run with an unhandled TypeError instead of cleanly skipping. Same shape in SyncListenerRuleStep.php:19.
_{Severity: WARNING}

Suggested Fix

Guard with a manifest presence check that returns SKIPPED:

if (! Manifest::has('apex') && ! Manifest::has('domain')) { return StepResult::SKIPPED; }

OR change Manifest::apex() to throw a catchable IntegrityCheckException (or similar) when both keys are absent — but that's a wider blast radius affecting alpha-era code, so the per-step guard is the safer fix for this PR.

Prompt

Review the code at src/Steps/Fargate/SyncHttpsListenerStep.php:17-21 and src/Steps/Fargate/SyncListenerRuleStep.php:19-23. A potential issue has been flagged: Manifest::apex() return type is `: string` and will TypeError on domain-less manifests, but the try/catch only catches ResourceDoesNotExistException. Verify and propose either a Manifest::has() guard returning SKIPPED, or a wider change making Manifest::apex() throw a catchable exception.

`sync:fargate` ties the command name to the implementation when Fargate *is* the implementation. `sync:compute` slots cleanly next to `sync:network` / `sync:storage` / `sync:standalone` / `sync:iam` / `sync:logging` and reclaims the v1-alpha vocabulary so anyone reading old docs lands in the right place. Public surface is the command name — durable. Step namespace (`src/Steps/Fargate/`) stays Fargate-specific because the steps *are* concretely Fargate (SyncEcsClusterStep, SyncTargetGroupStep). If a second compute backend ever materialises, add `src/Steps/X/` and the command picks the step set by manifest shape; the user-facing command stays put. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Blockers: - SyncTaskSecurityGroupStep: use createSecurityGroup response GroupId directly instead of re-fetching through UsesEc2::securityGroups(), which memoises pre-create state into a static property and would throw `Could not find Security Group matching name yolo-...-ecs-task- security-group` on every first-run sync. - SyncCommand: gate SyncComputeCommand enrolment behind `Manifest::has('tasks.web')` so existing non-Fargate consumers don't silently get a brand-new ALB / ECS cluster / ECR repo provisioned on their next `yolo sync`. Warnings: - SyncListenerRuleStep::nextAvailablePriority(): extract the pure walk out of priority() and bound it to 1000-49999, wrap floor↔ceiling on collision rather than escaping into the reserved <1000 range, and bail with IntegrityCheckException after a full sweep instead of spinning forever. 6 new unit tests pin the contract. - SyncEcrRepositoryStep: putLifecyclePolicy after createRepository with sensible defaults (expire untagged after 7d, keep last 30 tagged), overridable via `tasks.web.image-retention.{keep-count, untagged-days}`. - SyncTaskLogGroupStep (new): pre-create the CloudWatch log group before the task definition is registered, with a configurable retention (`tasks.web.log-retention`, default 30 days). Drops the `awslogs-create-group: true` driver flag from the task definition so the awslogs driver doesn't shadow-own the group. - SyncEcsServiceStep: replace `?? 0` with `?? $gracePeriod` so older services without `healthCheckGracePeriodSeconds` set don't churn an updateService call on every sync. Comment makes the sync-vs-deploy ownership of task-def revisions explicit. - EnsureResourceNameLengthsStep (new): manifest precheck for AWS's 32-char Name cap on target groups + ECS services. Fails fast with a clear message naming the offending key and the allowable length. - SyncHttpsListenerStep + SyncListenerRuleStep: short-circuit to SKIPPED when neither `apex` nor `domain` is set, instead of TypeErroring on Manifest::apex()'s `: string` return type. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Stacked on PR #25. Closes the loop from `sync:fargate` plumbing to a working end-to-end deploy: - `build` appends three Docker steps when the manifest sets `tasks.web`: ECR login (SDK token → `docker login --password-stdin`), `docker build` on the prepared `.yolo/build/` context, then push both `:{app-version}` and `:latest`. `tasks.web.dockerfile` overrides the default `Dockerfile` path; `tasks.web.platform` defaults to `linux/amd64` so Apple Silicon builds land on Fargate without `--platform` ceremony. - `deploy` registers a new task definition revision pinning `:{app-version}` (DRY-reuses `SyncTaskDefinitionStep::payload(?string)` with a new optional tag arg), then `updateService` with `forceNewDeployment: true` so ECS rolls the running tasks. `--watch` blocks on the `ServicesStable` waiter. Falls back to reading `.yolo/build/APP_VERSION` when `--app-version` isn't passed, so `yolo build production && yolo deploy production` Just Works. - Reuses existing `SyncStandaloneRecordSetStep` / `SyncMultitenancyRecordSetStep` for the ALB → Route 53 alias. `ChecksIfCommandsShouldBeRunning` already gates them on tenancy mode. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

stevethomas

Re-Review Summary (post-fix)

Verdict: ❌ Changes Requested

The first-round blockers + 6 warnings are addressed cleanly, but the re-review surfaced a new blocker that the original SG-cache fix didn't reach, plus a cluster of validation gaps that ride the same Manifest::get(..., default) | (int)/(bool) pattern.

Risk Assessment

Category	Status
Data Integrity	❌ Blocker (downstream stale-cache hit)
Security	⚠️ Warnings (cast validation gaps)
Breaking Changes	⚠️ Warnings (DNS step TypeError, stale stub)
Operational	⚠️ Warnings (retention enum / int validation)
Test Coverage	⚠️ Warnings (new code shipped without tests)

Blockers

src/Steps/Fargate/SyncEcsServiceStep.php:63 — the original SG stale-cache fix only resolved the same-step exposure in SyncTaskSecurityGroupStep. The downstream consumer SyncEcsServiceStep's create branch calls AwsResources::ecsTaskSecurityGroup()['GroupId'] for the awsvpcConfiguration. static::$ecsTaskSecurityGroup is unset (the catch in the SG step never seeded it), so it falls through to securityGroupByName() → securityGroups() which still returns the pre-create cached list. Throws ResourceDoesNotExistException, aborts sync mid-create. First-run sync:compute still fails — just one step later.

Warnings (inline)

SyncEcsServiceStep.php:69 — (bool) Manifest::get('tasks.web.enable-execute-command', false): PHP's (bool) cast on the YAML-quoted string "false" evaluates to true, silently enabling ECS Exec (shell into running tasks). Security-sensitive footgun.
SyncEcrRepositoryStep.php:42 — (int) casts on keep-count / untagged-days: garbage strings coerce to 0 (e.g. keep-count: thirty); floats truncate silently (keep-count: 1.7 → 1). AWS rejects countNumber: 0 server-side so it fails loud, but the failure surface is a deep AWS error rather than a clean preflight. Validate >= 1 up front.
SyncTaskLogGroupStep.php:18 — tasks.web.log-retention cast to int with no validation, but CloudWatch's putRetentionPolicy only accepts a fixed enum (1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, 2192, 2557, 2922, 3288, 3653). Reasonable-looking values like 45 or 0 throw InvalidParameterException at sync time.
src/Steps/Deploy/SyncStandaloneRecordSetStep.php:19 — Manifest::apex() TypeError exposure re-opens through DeployCommand (PR #26 stack base) for Fargate-only domain-less manifests. Same fix pattern as the SyncHttpsListenerStep / SyncListenerRuleStep guard already applied.
stubs/yolo.yml.stub:1 — original review WARNING #4 unaddressed. Still EC2/ASG-shaped with aws.autoscaling/aws.ec2 blocks and no tasks.web. A fresh yolo init produces a manifest where sync skips Fargate provisioning entirely.
tests/Unit/Steps/Fargate/SyncListenerRuleStepTest.php:30 — the "wraps from the ceiling back to the floor" test asserts < 49000 && >= 1000 but doesn't prove the floor-jump branch executed. A name whose CRC-derived base lands below 49000 already passes via the unrelated baseline. Tighten by computing or seeding a name that forces base==49999, then assert result == 1000 exactly.

Coverage gaps shipped this PR

SyncTaskLogGroupStep (new) — three branches (existing+match, existing+drift, missing→create), zero tests. At minimum a payload-pinning test on logGroupName().
SyncEcrRepositoryStep::lifecyclePolicy() (new) — manifest-driven JSON shape, zero coverage. Mirror the SyncTaskDefinitionStepTest pattern.
EnsureResourceNameLengthsStep (new) — boundary (32 chars exactly) not pinned. 3-case test would catch a future > ↔ >= flip.
SyncEcsServiceStep update-vs-create (carry-over) — still zero coverage on the diff predicate + create payload shape.

Notes (no inline)

describeLogGroups no pagination — findLogGroup() returns the first page only (default 50). Low-probability collision today but ergonomically annoying when it happens. Mirror onto UsesCloudWatchLogs::logGroup() which has the same gap.
SyncComputeCommand direct invocation — yolo sync:compute production against a non-tasks.web manifest bypasses the SyncCommand gate and provisions Fargate infra with all defaults. Either guard the sub-command or accept as power-user drift.
EnsureResourceNameLengthsStep format check — covers 32-char length, not AWS Name format rules (alphanumeric + hyphens, no leading/trailing/consecutive hyphens). Out-of-format names still surface as deep AWS errors.
Aws::tags(..., 'tags', associative: true)['tags'] round-trip in SyncTaskLogGroupStep — wraps then immediately unwraps. Cleaner with a dedicated Aws::associativeTags() helper.
First-pass notes still applicable — assignPublicIp: ENABLED, executionRoleArn bare default, imageTagMutability: MUTABLE, HTTPS cert Status !== 'ISSUED' fall-through — all unchanged, all tracked, no new exposure.
Carry-over coverage debt — SyncTaskSecurityGroupStep tag-key drift coupling, Aws::ecsTags() no direct test, dry-run honesty across the suite. None blocking; worth a tracking issue.

🤖 Reviewed by Claude Code

stevethomas · 2026-05-19T11:39:22Z

+                'networkConfiguration' => [
+                    'awsvpcConfiguration' => [
+                        'subnets' => AwsResources::publicSubnetIds(),
+                        'securityGroups' => [AwsResources::ecsTaskSecurityGroup()['GroupId']],


Finding: The original blocker fix in SyncTaskSecurityGroupStep:64 (use createSecurityGroup response GroupId directly) only dodged the stale-cache exposure inside that step. This line calls AwsResources::ecsTaskSecurityGroup() to populate the service's awsvpcConfiguration. static::$ecsTaskSecurityGroup is never set in the catch path of the SG step, so the lookup falls through to securityGroupByName() → UsesEc2::securityGroups() which still returns the pre-create cached list (populated by the initial failed lookup on SyncTaskSecurityGroupStep:23). The newly-created SG isn't in that list, so this throws ResourceDoesNotExistException and aborts sync mid-create. First-run sync:compute still fails — just one step later.
_{Severity: BLOCKER}

Suggested Fix

Seed the typed static cache inline after createSecurityGroup succeeds. Add a public setter to UsesEc2 (or a slim setEcsTaskSecurityGroup(array $sg) helper), then in SyncTaskSecurityGroupStep after the create + authorize:

$created = Aws::ec2()->createSecurityGroup([...]); Aws::ec2()->authorizeSecurityGroupIngress( static::loadBalancerIngressRule(['GroupId' => $created['GroupId']], $containerPort) ); AwsResources::setEcsTaskSecurityGroup(['GroupId' => $created['GroupId']]);

Downstream consumers only read ['GroupId'] so the minimal stub is sufficient. Belt-and-braces alternative is AwsResources::securityGroups(refresh: true) after create — costs an extra describe call but invalidates the whole pre-create cache. The targeted setter is cheaper.

Broader fix (kill the static caches entirely) is tracked in LPX-596.

Prompt

Review the code at src/Steps/Fargate/SyncEcsServiceStep.php#L63 alongside src/Concerns/UsesEc2.php#L158-L172 and src/Steps/Fargate/SyncTaskSecurityGroupStep.php#L45-L70. A potential issue has been flagged: the first-round SG cache fix dodged the stale-cache exposure inside SyncTaskSecurityGroupStep, but SyncEcsServiceStep's create branch re-resolves the same SG via AwsResources::ecsTaskSecurityGroup() which falls through to the still-stale UsesEc2::securityGroups() cache. Verify whether first-run sync:compute against a fresh environment will throw ResourceDoesNotExistException here. If real, propose seeding static::$ecsTaskSecurityGroup with the createSecurityGroup response GroupId via a setter on the trait.

stevethomas · 2026-05-19T11:39:22Z

+                ],
+                'loadBalancers' => [
+                    [
+                        'targetGroupArn' => AwsResources::targetGroup()['TargetGroupArn'],


Finding: (bool) Manifest::get('tasks.web.enable-execute-command', false) — PHP's (bool) cast on the YAML-quoted string "false" evaluates to true. YAML's loose-typing rules mean a manifest line like enable-execute-command: "false" (quoted by accident or by editor auto-formatting) silently enables ECS Exec, granting shell access into running containers to anyone holding ecs:ExecuteCommand. Security-sensitive enough to warrant a strict parse.
_{Severity: WARNING}

Suggested Fix

Use strict boolean parsing:

'enableExecuteCommand' => filter_var( Manifest::get('tasks.web.enable-execute-command', false), FILTER_VALIDATE_BOOLEAN, FILTER_NULL_ON_FAILURE, ) === true,

Or require === true so only the YAML boolean true enables it.

Prompt

Review src/Steps/Fargate/SyncEcsServiceStep.php#L69. A potential issue has been flagged: PHP's (bool) cast on the string "false" returns true, so a YAML-quoted enable-execute-command: "false" would silently enable ECS Exec. Verify whether the YAML loader prevents the misquote case; if not, propose strict boolean parsing.

stevethomas · 2026-05-19T11:39:22Z

+        }
+    }
+
+    protected static function lifecyclePolicy(): string


Finding: (int) casts on tasks.web.image-retention.keep-count and untagged-days silently coerce garbage. keep-count: thirty → 0 (string → int). keep-count: 1.7 → 1 (float truncation). AWS server-side rejects countNumber: 0, so a typo fails noisily — but the failure surfaces as a deep AWS API error at sync time, not a clean preflight. Worth a lower-bound + numeric validation up front.
_{Severity: WARNING}

Suggested Fix

Validate via filter_var with FILTER_VALIDATE_INT and a min_range option, throwing IntegrityCheckException on invalid input:

$keepCount = static::validatePositiveInt( Manifest::get('tasks.web.image-retention.keep-count', 30), 'tasks.web.image-retention.keep-count', ); $untaggedDays = static::validatePositiveInt( Manifest::get('tasks.web.image-retention.untagged-days', 7), 'tasks.web.image-retention.untagged-days', );

Pair with a unit test pinning the validation contract (test coverage warning above).

Prompt

Review src/Steps/Fargate/SyncEcrRepositoryStep.php#L42. A potential issue has been flagged: (int) casts on keep-count and untagged-days accept non-numeric strings (→ 0), floats, negatives silently. Propose a positive-int validator that throws IntegrityCheckException with a clear message before the AWS round trip.

stevethomas · 2026-05-19T11:39:22Z

+    public function __invoke(array $options): StepResult
+    {
+        $name = static::logGroupName();
+        $retention = (int) Manifest::get('tasks.web.log-retention', 30);


Finding: tasks.web.log-retention is cast to int and passed straight to putRetentionPolicy, but the API only accepts a fixed enum: 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, 2192, 2557, 2922, 3288, 3653. Reasonable-looking values like 45, 28, or 0 throw InvalidParameterException at sync time with a generic AWS error rather than a friendly manifest-validation message. Same gap exists in the alpha-era SyncIvsCloudWatchLogGroupStep so a shared helper makes sense.
_{Severity: WARNING}

Suggested Fix

Validate against the enum up front:

static $validRetentionDays = [1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, 2192, 2557, 2922, 3288, 3653]; $retention = (int) Manifest::get('tasks.web.log-retention', 30); if (! in_array($retention, $validRetentionDays, true)) { throw new IntegrityCheckException(sprintf( 'tasks.web.log-retention must be one of: %s (got %d)', implode(', ', $validRetentionDays), $retention, )); }

Extract to Helpers::validateCloudWatchLogRetention($value, $key) and call from both this step and SyncIvsCloudWatchLogGroupStep.

Prompt

Review src/Steps/Fargate/SyncTaskLogGroupStep.php#L18. A potential issue has been flagged: tasks.web.log-retention isn't validated against CloudWatch's fixed enum of accepted retention values, so values like 45 or 0 throw a generic InvalidParameterException at sync time. Propose a shared Helpers::validateCloudWatchLogRetention helper used here and in SyncIvsCloudWatchLogGroupStep.

The targeted dodge in SyncTaskSecurityGroupStep (use createSecurityGroup response GroupId directly) only fixed the in-step exposure. Downstream consumers — specifically SyncEcsServiceStep:63 calling AwsResources::ecsTaskSecurityGroup() for the awsvpcConfiguration — re-resolve the SG via securityGroupByName() → securityGroups(), which returned the pre-create cached list and threw ResourceDoesNotExistException. First-run sync:compute against a fresh environment still failed, just one step later. Strips static::\$securityGroups + the unused \$refresh parameter. The extra describeSecurityGroups call is ~50-100ms and only fires when a typed cache (loadBalancerSecurityGroup / ec2SecurityGroup / ecsTaskSecurityGroup / rdsSecurityGroup) misses, so the cost is bounded. Progressive strip per LPX-596 — the rest of the static caches stay until the broader cleanup, but \$securityGroups is the one currently biting first-run sync so it goes first. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Stacked on PR #25. Closes the loop from `sync:fargate` plumbing to a working end-to-end deploy: - `build` appends three Docker steps when the manifest sets `tasks.web`: ECR login (SDK token → `docker login --password-stdin`), `docker build` on the prepared `.yolo/build/` context, then push both `:{app-version}` and `:latest`. `tasks.web.dockerfile` overrides the default `Dockerfile` path; `tasks.web.platform` defaults to `linux/amd64` so Apple Silicon builds land on Fargate without `--platform` ceremony. - `deploy` registers a new task definition revision pinning `:{app-version}` (DRY-reuses `SyncTaskDefinitionStep::payload(?string)` with a new optional tag arg), then `updateService` with `forceNewDeployment: true` so ECS rolls the running tasks. `--watch` blocks on the `ServicesStable` waiter. Falls back to reading `.yolo/build/APP_VERSION` when `--app-version` isn't passed, so `yolo build production && yolo deploy production` Just Works. - Reuses existing `SyncStandaloneRecordSetStep` / `SyncMultitenancyRecordSetStep` for the ALB → Route 53 alias. `ChecksIfCommandsShouldBeRunning` already gates them on tenancy mode. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three Manifest::get(...) values flow into AWS API payloads with unchecked PHP casts that silently coerce garbage: * (bool) Manifest::get('tasks.web.enable-execute-command') — PHP's cast on the YAML-quoted string "false" returns true, silently enabling ECS Exec (shell into running containers). * (int) Manifest::get('tasks.web.image-retention.{keep-count, untagged-days}') — non-numeric strings coerce to 0, floats truncate. ECR rejects countNumber: 0 server-side so the failure is loud but deep (deferred AWS API error rather than preflight). * (int) Manifest::get('tasks.web.log-retention') — CloudWatch only accepts a fixed enum (1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, 2192, 2557, 2922, 3288, 3653). Values like 45 or 28 throw InvalidParameterException at sync time. Adds three helpers on Codinglabs\Yolo\Helpers: * validatePositiveInt(value, key) — filter_var INT with min_range=1, throws IntegrityCheckException naming the offending key + value. * validateCloudWatchLogRetention(value, key) — enum check, throws with the valid set listed. * validateStrictBool(value, key) — filter_var VALIDATE_BOOLEAN with NULL_ON_FAILURE; "false" string evaluates to false (not true), garbage strings throw. Used in SyncEcsServiceStep (enable-execute-command), SyncEcrRepositoryStep (image-retention), SyncTaskLogGroupStep (log-retention). lifecyclePolicy() promoted to public static so the ECR payload shape is now also unit-testable (separate follow-up). 20 new tests pin the validators' behaviour incl. the "false"-string trap the strict-bool helper exists to close. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Stacked on PR #25. Closes the loop from `sync:fargate` plumbing to a working end-to-end deploy: - `build` appends three Docker steps when the manifest sets `tasks.web`: ECR login (SDK token → `docker login --password-stdin`), `docker build` on the prepared `.yolo/build/` context, then push both `:{app-version}` and `:latest`. `tasks.web.dockerfile` overrides the default `Dockerfile` path; `tasks.web.platform` defaults to `linux/amd64` so Apple Silicon builds land on Fargate without `--platform` ceremony. - `deploy` registers a new task definition revision pinning `:{app-version}` (DRY-reuses `SyncTaskDefinitionStep::payload(?string)` with a new optional tag arg), then `updateService` with `forceNewDeployment: true` so ECS rolls the running tasks. `--watch` blocks on the `ServicesStable` waiter. Falls back to reading `.yolo/build/APP_VERSION` when `--app-version` isn't passed, so `yolo build production && yolo deploy production` Just Works. - Reuses existing `SyncStandaloneRecordSetStep` / `SyncMultitenancyRecordSetStep` for the ALB → Route 53 alias. `ChecksIfCommandsShouldBeRunning` already gates them on tenancy mode. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

DeployCommand unconditionally enrols this step, but Manifest::apex() has a `: string` return type and TypeErrors when neither `apex` nor `domain` is set in the manifest — a valid v1 configuration for a Fargate POC sitting behind the default ALB DNS. Mirrors the same SKIPPED early-return guard already applied to SyncHttpsListenerStep and SyncListenerRuleStep on PR #25. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…account lookups Repro: `php vendor/bin/yolo sync:compute production --dry-run` against a fresh AWS account (no ECS cluster yet) ran the first 9 steps cleanly (ECR, cluster, task SG, ALB, target group, listeners, task def — all WOULD_CREATE) then aborted at step 10 with an uncaught Guzzle 400: {"__type":"ClusterNotFoundException","message":"Cluster not found."} `UsesEcs::ecsService()` called `describeServices` with no try/catch, so the SDK's `ClusterNotFoundException` (Cluster doesn't exist yet on a cold account) escaped past the calling step's `catch (ResourceDoesNotExistException)` branch — which is where the `if (dry-run) return WOULD_CREATE` lives. Mirrors the existing pattern in `ecsTaskDefinition()` at src/Concerns/UsesEcs.php — translate any AwsException to ResourceDoesNotExistException so the caller's dry-run + create branches work. Audited siblings under src/Concerns/Uses*.php: - ecsTaskDefinition, ecrRepository, targetGroup, getParameter — already wrapped, no change. - loadBalancer, dbSubnetGroup, hostedZones, certificates, securityGroups, vpc, internetGateway, routeTable, subnets, logGroup — enumerate then fall through to ResourceDoesNotExistException naturally; no parent- identifier reference that could 404. - loadBalancerListenerOnPort, listenerCertificate — call sites guarantee the parent (ALB / listener) exists before invocation; the protective ResourceDoesNotExistException fires upstream. Also drops `EnsureResourceNameLengthsStep` — AWS's native `InvalidParameterValue: Target group name cannot exceed 32 characters` error is already actionable. The step was a busywork pre-flight that added noise to sync output without meaningful ergonomic gain. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Stacked on PR #25. Closes the loop from `sync:fargate` plumbing to a working end-to-end deploy: - `build` appends three Docker steps when the manifest sets `tasks.web`: ECR login (SDK token → `docker login --password-stdin`), `docker build` on the prepared `.yolo/build/` context, then push both `:{app-version}` and `:latest`. `tasks.web.dockerfile` overrides the default `Dockerfile` path; `tasks.web.platform` defaults to `linux/amd64` so Apple Silicon builds land on Fargate without `--platform` ceremony. - `deploy` registers a new task definition revision pinning `:{app-version}` (DRY-reuses `SyncTaskDefinitionStep::payload(?string)` with a new optional tag arg), then `updateService` with `forceNewDeployment: true` so ECS rolls the running tasks. `--watch` blocks on the `ServicesStable` waiter. Falls back to reading `.yolo/build/APP_VERSION` when `--app-version` isn't passed, so `yolo build production && yolo deploy production` Just Works. - Reuses existing `SyncStandaloneRecordSetStep` / `SyncMultitenancyRecordSetStep` for the ALB → Route 53 alias. `ChecksIfCommandsShouldBeRunning` already gates them on tenancy mode. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

DeployCommand unconditionally enrols this step, but Manifest::apex() has a `: string` return type and TypeErrors when neither `apex` nor `domain` is set in the manifest — a valid v1 configuration for a Fargate POC sitting behind the default ALB DNS. Mirrors the same SKIPPED early-return guard already applied to SyncHttpsListenerStep and SyncListenerRuleStep on PR #25. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirrors PR #25's SyncCommand gate. Without it, a non-Fargate consumer running `yolo deploy production` would unconditionally execute the six Fargate-flavoured steps — registering a stray ECS task definition revision, then crashing at UpdateEcsServiceStep with ServiceNotFoundException. Partial state + opaque error. The gate matches the existing entry-message pattern (early `error()` + FAILURE return) so the failure surface is operator-friendly: clear message, no AWS-side side effects. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`/healthy` was a carry-over from the alpha lineage with no convention behind it. `/health` is the generic HTTP health-endpoint convention that doesn't tie YOLO's default to a specific Laravel version. (Laravel 11+ ships `/up` as its native default; consumers can override via `tasks.web.health-check.path`.) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Stacked on PR #25. Closes the loop from `sync:fargate` plumbing to a working end-to-end deploy: - `build` appends three Docker steps when the manifest sets `tasks.web`: ECR login (SDK token → `docker login --password-stdin`), `docker build` on the prepared `.yolo/build/` context, then push both `:{app-version}` and `:latest`. `tasks.web.dockerfile` overrides the default `Dockerfile` path; `tasks.web.platform` defaults to `linux/amd64` so Apple Silicon builds land on Fargate without `--platform` ceremony. - `deploy` registers a new task definition revision pinning `:{app-version}` (DRY-reuses `SyncTaskDefinitionStep::payload(?string)` with a new optional tag arg), then `updateService` with `forceNewDeployment: true` so ECS rolls the running tasks. `--watch` blocks on the `ServicesStable` waiter. Falls back to reading `.yolo/build/APP_VERSION` when `--app-version` isn't passed, so `yolo build production && yolo deploy production` Just Works. - Reuses existing `SyncStandaloneRecordSetStep` / `SyncMultitenancyRecordSetStep` for the ALB → Route 53 alias. `ChecksIfCommandsShouldBeRunning` already gates them on tenancy mode. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

DeployCommand unconditionally enrols this step, but Manifest::apex() has a `: string` return type and TypeErrors when neither `apex` nor `domain` is set in the manifest — a valid v1 configuration for a Fargate POC sitting behind the default ALB DNS. Mirrors the same SKIPPED early-return guard already applied to SyncHttpsListenerStep and SyncListenerRuleStep on PR #25. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirrors PR #25's SyncCommand gate. Without it, a non-Fargate consumer running `yolo deploy production` would unconditionally execute the six Fargate-flavoured steps — registering a stray ECS task definition revision, then crashing at UpdateEcsServiceStep with ServiceNotFoundException. Partial state + opaque error. The gate matches the existing entry-message pattern (early `error()` + FAILURE return) so the failure surface is operator-friendly: clear message, no AWS-side side effects. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Stacked on PR #25. Closes the loop from `sync:fargate` plumbing to a working end-to-end deploy: - `build` appends three Docker steps when the manifest sets `tasks.web`: ECR login (SDK token → `docker login --password-stdin`), `docker build` on the prepared `.yolo/build/` context, then push both `:{app-version}` and `:latest`. `tasks.web.dockerfile` overrides the default `Dockerfile` path; `tasks.web.platform` defaults to `linux/amd64` so Apple Silicon builds land on Fargate without `--platform` ceremony. - `deploy` registers a new task definition revision pinning `:{app-version}` (DRY-reuses `SyncTaskDefinitionStep::payload(?string)` with a new optional tag arg), then `updateService` with `forceNewDeployment: true` so ECS rolls the running tasks. `--watch` blocks on the `ServicesStable` waiter. Falls back to reading `.yolo/build/APP_VERSION` when `--app-version` isn't passed, so `yolo build production && yolo deploy production` Just Works. - Reuses existing `SyncStandaloneRecordSetStep` / `SyncMultitenancyRecordSetStep` for the ALB → Route 53 alias. `ChecksIfCommandsShouldBeRunning` already gates them on tenancy mode. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

DeployCommand unconditionally enrols this step, but Manifest::apex() has a `: string` return type and TypeErrors when neither `apex` nor `domain` is set in the manifest — a valid v1 configuration for a Fargate POC sitting behind the default ALB DNS. Mirrors the same SKIPPED early-return guard already applied to SyncHttpsListenerStep and SyncListenerRuleStep on PR #25. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirrors PR #25's SyncCommand gate. Without it, a non-Fargate consumer running `yolo deploy production` would unconditionally execute the six Fargate-flavoured steps — registering a stray ECS task definition revision, then crashing at UpdateEcsServiceStep with ServiceNotFoundException. Partial state + opaque error. The gate matches the existing entry-message pattern (early `error()` + FAILURE return) so the failure surface is operator-friendly: clear message, no AWS-side side effects. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(LPX-576, LPX-579): build + deploy commands for first Fargate POC Stacked on PR #25. Closes the loop from `sync:fargate` plumbing to a working end-to-end deploy: - `build` appends three Docker steps when the manifest sets `tasks.web`: ECR login (SDK token → `docker login --password-stdin`), `docker build` on the prepared `.yolo/build/` context, then push both `:{app-version}` and `:latest`. `tasks.web.dockerfile` overrides the default `Dockerfile` path; `tasks.web.platform` defaults to `linux/amd64` so Apple Silicon builds land on Fargate without `--platform` ceremony. - `deploy` registers a new task definition revision pinning `:{app-version}` (DRY-reuses `SyncTaskDefinitionStep::payload(?string)` with a new optional tag arg), then `updateService` with `forceNewDeployment: true` so ECS rolls the running tasks. `--watch` blocks on the `ServicesStable` waiter. Falls back to reading `.yolo/build/APP_VERSION` when `--app-version` isn't passed, so `yolo build production && yolo deploy production` Just Works. - Reuses existing `SyncStandaloneRecordSetStep` / `SyncMultitenancyRecordSetStep` for the ALB → Route 53 alias. `ChecksIfCommandsShouldBeRunning` already gates them on tenancy mode. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: preserve yolo.yml deploy + deploy-all blocks for Fargate PR #26's first cut silently dropped manifest deploy: / deploy-all: execution — the alpha-era StartCommand ran them server-side via CodeDeploy hooks, and the new DeployCommand had no equivalent. - ExecuteDeployStepsStep (new) runs the `deploy:` block as a one-off ECS RunTask on the just-registered task definition revision, before the service is rolled. The container CMD is overridden to `sh -c "set -e; <commands>"` so a failed migration aborts the deploy with the service still on the previous revision (no traffic shift, rollback is a no-op). - GenerateEntrypointScriptStep (new) writes `.yolo/build/.yolo-entrypoint.sh` during build. The script runs each `deploy-all:` command then `exec "$@"`. The consumer's Dockerfile wires it as `ENTRYPOINT ["/app/.yolo-entrypoint.sh"]` so per-container hooks (artisan optimize etc.) fire on every task start without YOLO reaching into the container at runtime. deploy: → one-off RunTask (runs once per deploy, before traffic switch) deploy-all: → entrypoint wrapper (runs on every container start) Manifest abstraction preserved end-to-end. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: guard SyncStandaloneRecordSetStep against domain-less manifests DeployCommand unconditionally enrols this step, but Manifest::apex() has a `: string` return type and TypeErrors when neither `apex` nor `domain` is set in the manifest — a valid v1 configuration for a Fargate POC sitting behind the default ALB DNS. Mirrors the same SKIPPED early-return guard already applied to SyncHttpsListenerStep and SyncListenerRuleStep on PR #25. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: gate `yolo deploy` behind Manifest::has('tasks.web') Mirrors PR #25's SyncCommand gate. Without it, a non-Fargate consumer running `yolo deploy production` would unconditionally execute the six Fargate-flavoured steps — registering a stray ECS task definition revision, then crashing at UpdateEcsServiceStep with ServiceNotFoundException. Partial state + opaque error. The gate matches the existing entry-message pattern (early `error()` + FAILURE return) so the failure surface is operator-friendly: clear message, no AWS-side side effects. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: gate BuildCommand's Docker steps on Manifest::has('tasks'), not 'tasks.web' Docker build + ECR push are task-type-agnostic — the same Laravel-in-a- container image serves whatever task type the deploy targets. The runtime difference (octane:start vs queue:work vs schedule:run) lives in the task definition's `command` override, not in the image. The original `Manifest::has('tasks.web')` gate meant queue- or scheduler-only apps got the base build pipeline (rsync, composer install, npm) but no image to deploy. Broadening to `Manifest::has( 'tasks')` future-proofs without changing today's behaviour (web is still the only task type in the schema). DeployCommand's gate stays at `tasks.web` deliberately — today's DeployCommand is web-specific (it updates the ECS web service). Queue and scheduler deploys will branch or get their own commands when those task types land (LPX-580+). 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…out a public face (#29) * feat(LPX-604): headless mode — skip ALB/listeners/Route53 when no public face Detect headless apps from manifest shape (no `domain`, no `apex`, no tenant declares either) and skip every step that touches a public- facing AWS resource: ALB, target group, HTTP/HTTPS listeners, listener rule, hosted zone, SSL cert, Route 53 record set. The ECS service still runs without a load balancer association. Use cases: queue worker pools, scheduler-only apps, internal services with no public surface. Surface: - `Manifest::isHeadless()` — true when neither `domain`/`apex` is set on the solo manifest, and (for multitenant) every tenant entry lacks both keys. - `Contracts/ExecutesWebStep` — new marker for ALB-side steps. Applied to SyncLoadBalancerStep, SyncTargetGroupStep, SyncHttpListenerStep, SyncHttpsListenerStep, SyncListenerRuleStep. - `ExecutesDomainStep` finally wired into the gate (was an unused marker, flagged in the PR #26 review). Applied to SyncSoloRecordSetStep alongside its existing ExecutesSoloStep. - `ChecksIfCommandsShouldBeRunning` skips both markers under headless. - `SyncEcsServiceStep` create/update payloads drop `loadBalancers` and `healthCheckGracePeriodSeconds` when headless. Extracted `needsUpdate()` / `createPayload()` / `updatePayload()` as pure helpers — also closes the test-coverage gap the PR #25 review flagged for this step. - `Manifest::tenants()` no longer TypeErrors on a headless tenant entry — apex normalisation gracefully resolves to null instead of bombing on `str_starts_with(null, ...)`. 16 new unit tests pin the headless detector matrix, the SyncEcsServiceStep drift predicate (incl. carry-over coverage gap from PR #25 review), the headless conditional payload shape, and the tenants() normaliser's headless tolerance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(LPX-604): unify shouldBeRunning gate into one compound skip block Three step-skip conditions (solo-in-multitenant, multitenancy-in-solo, public-step-in-headless) sat as two separate blocks. Folding into one compound `if` matches the shape of the check (all are "skip when manifest mode disagrees with the marker") and makes the skip surface trivially extensible for the next marker we add. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(LPX-604): collapse ExecutesDomainStep into ExecutesWebStep Two markers for "stuff that only matters with a public face" was YAGNI — no real step is web-but-not-domain or domain-but-not-web. Headless mode skips the lot uniformly. * ExecutesWebStep is now the sole public-facing marker. * ExecutesDomainStep contract deleted. * SyncHostedZoneStep, SyncSslCertificateStep, SyncSoloRecordSetStep re-tagged with ExecutesWebStep. * ChecksIfCommandsShouldBeRunning skip-clause halves in size. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(LPX-604): split headless gate from the tenancy-mode gate The previous compound-skip block lumped three different concerns into one OR-chain. Tenancy-mode mismatch (solo step in mt app, mt step in solo app) and "is this a public-facing step in a headless app" are orthogonal — readability wins from separation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: guard AWS account mismatch in Command::execute Before any AWS-touching command runs handle(), verify the manifest's aws.account-id matches sts:GetCallerIdentity for the resolved profile. Catches the foot-gun where YOLO_<ENV>_AWS_PROFILE points at the wrong account and the manifest's ID gets used silently for ARN interpolation. One STS call per command run (~50ms). Skips when the manifest doesn't declare an account (legacy / test fixtures). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: throw IntegrityCheckException from the account-mismatch guard `ensureManifestAccountMatchesProfile()` returned bool + called error() itself, while every other manifest-validation failure in the codebase (Manifest::apex, Helpers::validate*, EnsureEnvIsConfiguredCorrectlyStep, SyncListenerRuleStep::nextAvailablePriority, the SG-rule integrity guards) throws IntegrityCheckException. Aligning with the prevailing pattern: void return, throw on either STS lookup failure or account mismatch, Symfony's exception renderer handles the styled output for free. DX wins: single responsibility (guard either succeeds or throws), the caller drops the `if (! ...) return 1` shim, stack trace available with -vvv for debugging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: revert account guard to bool+error() pattern, add profileAccountId helper Two pieces of feedback: 1. Throwing IntegrityCheckException from Command::execute() broke the prevailing pattern at that layer — every other early gate (missing manifest, wrong environment, missing AWS_PROFILE) uses `error() + return 1`. The IntegrityCheckException throws all live deeper, in Steps and Manifest helpers. Reverted to bool return so the four early gates look identical. 2. Was reaching past Aws::accountId() to call Manifest::get() directly. Plus the STS side had no helper. Added Aws::profileAccountId() (named for the .env profile rather than 'caller', since the value is whatever YOLO_<ENV>_AWS_PROFILE resolves to) so both sides go through Aws::*. Presence check now Manifest::has('aws.account-id') rather than the truthy-check on Aws::accountId() — the latter has a `: string` return type that TypeErrors when the key is absent, and widening to `?string` would cascade into every existing caller using sprintf/concat. InitCommand carve-out at Command::execute() short-circuits before registerAwsServices(), so new apps don't hit the guard before they've written a manifest. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: require aws.account-id in the manifest The "skip when not declared" early-return was a silent foot-gun — same one the whole guard exists to prevent. A manifest missing aws.account-id would happily run against whatever YOLO_<ENV>_AWS_PROFILE resolves to, with no safety net. Now bails with a clear "must declare aws.account-id under environments.<env>" message. InitCommand short-circuits before this guard, so new apps run `yolo init` without tripping it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: require name + aws.region in manifest integrity check Split the manifest-vs-AWS reconciliation into two early gates in Command::execute(): * ensureManifestIntegrity() runs *before* registerAwsServices(). Bails when any of `name`, `aws.region`, `aws.account-id` is missing. `aws.region` is the load-bearing one — RegistersAws feeds it into every SDK client at construction time, so a null region collapses into a deep AWS SDK error. `name` drives every keyedResourceName() and silently produces invalid AWS resource names (trailing hyphen) if absent. account-id presence is now also in this group. * ensureAccountMatchesProfile() runs *after* registerAwsServices() and STS-verifies the manifest's account-id against YOLO_<ENV>_AWS_PROFILE. Renamed from ensureManifestAccountMatches- Profile (the manifest part is implied now that integrity runs upstream). Test split: CommandManifestIntegrityTest (no AWS mocks needed) and CommandAccountGuardTest (STS mock). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: extract ensureNameDeclared + ensureManifestKeyDeclared ensureManifestIntegrity() reads as a series of named gates short- circuited via && rather than a foreach loop with implicit semantics. Each extracted method tells you what it's checking — the loop was clever but obscured the per-key checks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: collapse ensureNameDeclared into ensureManifestKeyDeclared Manifest::has() now falls back to top-level when an env-scoped lookup misses. The dedicated ensureNameDeclared() method + its bespoke error message were the only special-case for top-level keys; collapsing both down keeps the integrity check a flat list of `Manifest::has` lookups. Error message dropped the "under environments.<env>" qualifier — works uniformly for top-level `name` and env-scoped `aws.*` keys. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: revert Manifest::has widening, keep ensureNameDeclared separate Widening Manifest::has() to fall back to top-level was solving one caller's problem by relaxing the contract of a widely-used helper — opens subtle ambiguity if a future env-scoped key ever shares a name with a top-level one. Reverted; ensureNameDeclared() handles the top-level `name` lookup directly via Manifest::current(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: drop comments that restate code routedHosts() comment block restated the ternary, and the tenants normaliser comment took three lines to caption a one-line null coalesce. Both code paths read for themselves. Kept comments elsewhere that explain non-obvious *why* — un-memoised SG cache (UsesEc2), AwsException translation (UsesEcs), narrow service drift detection (SyncEcsServiceStep), headless tenant raw read (Manifest::isHeadless). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: drop Manifest comments that restated code * `// prefer the apex key when specified` — restated what `get(key, default)` already does. * `// apex resolves to null for headless tenants` — the `?? ($config['domain'] ?? null)` chain already shows it. Kept the one-line `// Read raw — tenants() normaliser TypeErrors on a headless tenant.` in isHeadless() — that's a real why-not-use-tenants() signal that disappears without it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

stevethomas mentioned this pull request May 19, 2026

LPX-576 + LPX-579: build + deploy commands for first Fargate POC #26

Merged

stevethomas commented May 19, 2026

View reviewed changes

stevethomas and others added 2 commits May 19, 2026 21:09

stevethomas commented May 19, 2026

View reviewed changes

stevethomas merged commit b9f85ee into main May 19, 2026
5 checks passed

stevethomas deleted the steve/lpx-578-sync-command-ecs-cluster-alb-adoption branch May 19, 2026 12:14

This was referenced May 19, 2026

LPX-602: route listener rule on domain, not apex #28

Merged

LPX-604: headless mode — skip ALB / listeners / Route53 for apps without a public face #29

Merged

Conversation

stevethomas commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this ships

TL;DR

Steps in sync:compute (in order)

Manifest shape

Manifest value validation

Other AWS plumbing

Known follow-ups (separate tickets)

Reviewer notes

Uh oh!

stevethomas left a comment

Choose a reason for hiding this comment

Review Summary

Risk Assessment

Blockers

Warnings (inline)

Notes (no inline comment)

Test coverage gaps

Naming

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevethomas left a comment

Choose a reason for hiding this comment

Re-Review Summary (post-fix)

Risk Assessment

Blockers

Warnings (inline)

Coverage gaps shipped this PR

Notes (no inline)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

stevethomas commented May 19, 2026 •

edited

Loading

Steps in `sync:compute` (in order)