LPX-578: sync:fargate command for ECS + ALB adoption#25
Conversation
Adds the sync:fargate command and steps for provisioning the Fargate compute stack: ECR repo, ECS cluster, task SG, ALB (lookup-or-create by aws.alb manifest key), target group, HTTP/HTTPS listeners, host based listener rule, task definition, and ECS service. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
stevethomas
left a comment
There was a problem hiding this comment.
Review Summary
Verdict: ❌ Changes Requested
Two blockers, six warnings worth landing fixes for, several notes. The blockers both hit on first run for a current dev-main consumer (CL marketing site is on dev-main right now), so they need to land before this merges.
Risk Assessment
| Category | Status |
|---|---|
| Data Integrity | ❌ Blocker |
| Security | |
| Breaking Changes | ❌ Blocker |
| Operational | |
| Test Coverage |
Blockers
-
src/Steps/Fargate/SyncTaskSecurityGroupStep.php:64— first-run create path is broken by a stale-cache bug.UsesEc2::securityGroups()memoises intostatic::$securityGroupson the failed lookup at line 23. AftercreateSecurityGroupsucceeds, the re-fetch at line 64 hits the cached pre-create list and throwsResourceDoesNotExistException. Verified by tracingUsesEc2::securityGroupByName()→UsesEc2::securityGroups()— no refresh between the two calls. Same pattern exists in the alpha-eraSyncEc2SecurityGroupStep(out of scope for this PR but worth noting). -
src/Commands/SyncCommand.php:39—SyncFargateCommandis enrolled in the top-levelsyncorchestration with no manifest gate. For any current dev-main consumer with a non-Fargate manifest (the CL marketing siteyolo.ymlis one), runningyolo sync productionnow silently provisions a new ALB (~AU$24/mo), an ECS cluster (containerInsights enabled — ongoing CloudWatch cost), an ECR repo, and a task SG. Conflicts with Steve's standing rule that infra commands need explicit confirmation. Gate withManifest::has('tasks.web')or have each Fargate step short-circuit toSKIPPEDwhentasks.webis absent.
Warnings (inline)
See inline comments on:
SyncListenerRuleStep.php:96— priority collision walk escapes the 1000-49999 floorSyncEcrRepositoryStep.php:25— no lifecycle policy, images accumulate foreverSyncTaskDefinitionStep.php:60— log group auto-created with no retention policySyncEcsServiceStep.php:38— narrow drift detection; service never adopts new task def revisionsUsesElasticLoadBalancingV2.php:42— TG name lookup change, ELBv2's 32-char Name cap unvalidatedSyncHttpsListenerStep.php:18—Manifest::apex()TypeErrors on domain-less manifests (only caught byResourceDoesNotExistException)
Notes (no inline comment)
- Existence-only checks across the Fargate suite —
SyncEcsClusterStep,SyncTargetGroupStep,SyncEcsServiceStep(mostly) all returnSYNCEDwhen a resource exists, regardless of manifest drift. Health-check path, capacity providers, network config, deployment config — all silently masked. Pattern issue, worth a follow-up that establishes a consistent drift convention. - Task definition revisions accumulate —
SyncTaskDefinitionStepre-registers a fresh revision on every sync with no payload diff. Combined with the service-not-adopting issue above, every sync churns a new revision that the service ignores. - Public subnet placement — Fargate tasks land in public subnets with
assignPublicIp: ENABLED. Costs ~AU$5.40/mo per task in IPv4 fees and diverges from standard Fargate posture. Acceptable for the alpha (no private subnets insync:networkyet) but call it out in docs. - ECR
imageTagMutability: MUTABLE— fine for:latestrolling but precludes the digest-pin / hardened-deploy path; flip toIMMUTABLEwhen the deploy command lands if it's digest-pinned. - Cert status fall-through —
SyncHttpsListenerStep:23skips on any status!== 'ISSUED'. Correct behaviour butEXPIRED/REVOKED/FAILEDget the sameSKIPPEDsignal asPENDING_VALIDATION, masking real cert problems during sync. - No teardown path — no
destroy:fargate(or equivalent) ships with this PR. Half-failed syncs leave an awful manual-cleanup checklist (cluster → service → task def revisions → listener rule → listeners → TG → ALB → task SG → ECR repo → log group). File the follow-up. - Stub + README still alpha-shaped —
stubs/yolo.yml.stubis EC2/ASG-shaped with noaws.alb,aws.ecs.security-group,ecs.cluster, ortasks.web.*representation. README's one-liner promisessyncworks end-to-end but the stub doesn't reflect it. Either update the stub or add adocs/manifest.md.
Test coverage gaps
SyncListenerRuleStep::priority()+findRuleForHosts()— zero coverage. Highest-value missing tests given the multi-tenant priority-collision risk.SyncEcsServiceStepupdate-vs-create branching — four behavioural paths, zero coverage.healthCheckGracePeriodSeconds ?? 0coalesce will silently churnupdateServicecalls on every sync against an existing service.SyncTaskSecurityGroupSteptag-key drift — read filter and write tag are coupled by string match across the same file; a small test pinning the tag key in both places would catch any future drift.- Dry-run honesty across all 10 Fargate steps — no test confirms
--dry-runactually avoids SDK writes. Trivial to add (bind mock client that throws on writes).
Naming
Separately raising with @stevethomas: sync:fargate ties the command name to the implementation when Fargate IS the implementation. sync:compute slots next to sync:network / sync:storage / sync:standalone and reclaims the v1-alpha vocabulary. Step namespace (src/Steps/Fargate/) stays Fargate-specific. Rename incoming as a follow-up commit on this branch.
🤖 Reviewed by Claude Code
| ], | ||
| ]); | ||
|
|
||
| $securityGroup = AwsResources::ecsTaskSecurityGroup(); |
There was a problem hiding this comment.
Finding: First-run create path will always throw ResourceDoesNotExistException here. UsesEc2::securityGroups() memoises into static::$securityGroups on the failed lookup at line 23 (the call inside the try block populates the cache with the pre-create list, then throws). After createSecurityGroup succeeds on line 52, the re-fetch at line 64 calls securityGroupByName() → securityGroups() again, hits the cached pre-create list, doesn't find the new SG, throws. The same pattern exists in the alpha-era SyncEc2SecurityGroupStep:110 (out of scope for this PR but worth a separate ticket).
Severity: BLOCKER
Suggested Fix
Use the GroupId returned by createSecurityGroup directly instead of re-fetching via the cached lookup:
$result = Aws::ec2()->createSecurityGroup([...]);
Aws::ec2()->authorizeSecurityGroupIngress(
static::loadBalancerIngressRule(['GroupId' => $result['GroupId']], $containerPort)
);Or force a refresh by calling AwsResources::securityGroups(refresh: true) before the re-fetch. The SDK-response-direct approach is one fewer round trip.
Prompt
Review the code at src/Steps/Fargate/SyncTaskSecurityGroupStep.php:64 alongside src/Concerns/UsesEc2.php:100-110. A potential issue has been flagged: UsesEc2::securityGroups() memoises into a static property; the first call inside this step (line 23 via ecsTaskSecurityGroup → securityGroupByName) populates the cache with the pre-create SG list, then throws. After createSecurityGroup succeeds at line 52, the re-fetch at line 64 hits the stale cache and throws ResourceDoesNotExistException, breaking first-run sync. Verify whether this is real — if so, propose a fix using the createSecurityGroup response GroupId directly; if not, explain why.
| : [ | ||
| SyncStandaloneCommand::class, | ||
| ], | ||
| SyncFargateCommand::class, |
There was a problem hiding this comment.
Finding: SyncFargateCommand is enrolled in the top-level sync orchestration unconditionally, with no manifest guard. For any current dev-main consumer with a non-Fargate manifest (the CL marketing site yolo.yml matches this shape today), running yolo sync production will silently provision an ALB (~AU$24/mo), an ECS cluster with containerInsights enabled (CloudWatch cost), an ECR repo, and a task SG that they didn't ask for. Conflicts with Steve's standing rule that infra commands need explicit consent — currently it'd run on the first sync after upgrading their dev-main pin without any opt-in. Also: SyncHttpsListenerStep:18 and SyncListenerRuleStep:19 both call Manifest::apex() (return-typed : string) inside a try/catch that only catches ResourceDoesNotExistException, so a domain-less manifest crashes the whole sync run with an unhandled TypeError instead of cleanly skipping.
Severity: BLOCKER
Suggested Fix
Gate the enrolment behind a manifest opt-in:
collect([
SyncNetworkCommand::class,
SyncStorageCommand::class,
...Manifest::isMultitenanted() ? [...] : [SyncStandaloneCommand::class],
...Manifest::has('tasks.web') ? [SyncFargateCommand::class] : [],
SyncIamCommand::class,
SyncLoggingCommand::class,
])->each(...);OR have each Fargate step return StepResult::SKIPPED when tasks.web is absent. The first form is cheaper and matches the existing multitenancy gating shape just above. Pairs naturally with the sync:compute rename — same gate, new command name.
Prompt
Review the code at src/Commands/SyncCommand.php:39. A potential issue has been flagged: SyncFargateCommand runs unconditionally as part of the top-level sync, so any current dev-main consumer with a non-Fargate manifest (e.g. CL marketing site) will silently get a new ALB, ECS cluster, ECR repo, and task SG provisioned without consent. Also crashes on domain-less manifests because SyncHttpsListenerStep/SyncListenerRuleStep call the return-typed Manifest::apex() inside a try/catch that only catches ResourceDoesNotExistException. Verify and propose either a manifest gate on enrolment or per-step SKIPPED short-circuits.
| while (in_array($base, $usedPriorities, true)) { | ||
| $base = ($base % 50000) + 1; | ||
| } | ||
|
|
There was a problem hiding this comment.
Finding: The collision walk $base = ($base % 50000) + 1 can produce priorities below 1000 (the intended floor) and walk into priority slots typically reserved for manual high-priority overrides — base=49999 collides, next is 1, then 2, then 3, etc. Also: no bounded termination, so if all 50000 slots are taken the loop spins forever. Additionally describeRules is called twice per invocation (once in findRuleForHosts(), again here) — minor API pressure, but in multi-tenant fan-out it doubles ELB describe TPS pressure (10 TPS cap per region).
Severity: WARNING
Suggested Fix
Stay bounded within the intended range and bail on exhaustion:
$base = (abs(crc32(Helpers::keyedResourceName(exclusive: true))) % 49000) + 1000;
$attempts = 0;
while (in_array($base, $usedPriorities, true)) {
if (++$attempts > 49000) {
throw new IntegrityCheckException('ALB listener rule priority space exhausted');
}
$base = $base >= 49999 ? 1000 : $base + 1;
}Add a unit test pinning: priority always within 1000..49999, deterministic for a given app name when no collision, walks correctly when collisions occur.
Prompt
Review the code at src/Steps/Fargate/SyncListenerRuleStep.php:82-99. A potential issue has been flagged: (a) the collision walk `$base = ($base % 50000) + 1` escapes the intended 1000-49999 floor and can land on priority 1, conflicting with manually-set rules; (b) no bounded termination — exhausted space loops forever; (c) describeRules is called twice per invocation. Verify and propose a bounded walk + a unit test for priority() and findRuleForHosts() using a Container-bound mock elasticLoadBalancingV2 client.
| return StepResult::WOULD_CREATE; | ||
| } | ||
|
|
||
| Aws::ecr()->createRepository([ |
There was a problem hiding this comment.
Finding: ECR repository is created with no lifecycle policy. Every build will push a new tag (PR #26 pushes :{app-version} + :latest), and ECR keeps everything forever by default. Default per-repo image limit is 10k images (raisable but a hard ceiling), and storage is billed at US$0.10/GB-mo per image layer. For LP's multi-tenant footprint this gets noticeable fast.
Severity: WARNING
Suggested Fix
Follow up the createRepository call with a putLifecyclePolicy setting a sensible default (configurable via manifest):
Aws::ecr()->putLifecyclePolicy([
'repositoryName' => AwsResources::ecrRepositoryName(),
'lifecyclePolicyText' => json_encode([
'rules' => [
[
'rulePriority' => 1,
'description' => 'Expire untagged images after 7 days',
'selection' => ['tagStatus' => 'untagged', 'countType' => 'sinceImagePushed', 'countUnit' => 'days', 'countNumber' => 7],
'action' => ['type' => 'expire'],
],
[
'rulePriority' => 2,
'description' => 'Keep last 30 tagged images',
'selection' => ['tagStatus' => 'any', 'countType' => 'imageCountMoreThan', 'countNumber' => 30],
'action' => ['type' => 'expire'],
],
],
]),
]);Prompt
Review the code at src/Steps/Fargate/SyncEcrRepositoryStep.php:19-30. A potential issue has been flagged: ECR repository is created with no lifecycle policy — images accumulate forever and eventually hit the soft cap of 10k per repo plus unbounded storage cost. Propose a default putLifecyclePolicy (expire untagged after 7d, keep last 30 tagged) overridable via manifest, e.g. tasks.web.image-retention.
| 'awslogs-stream-prefix' => 'web', | ||
| 'awslogs-create-group' => 'true', | ||
| ], | ||
| ], |
There was a problem hiding this comment.
Finding: awslogs-create-group: 'true' lets the awslogs driver auto-create the log group on first task launch. The auto-created group has no retention policy (logs never expire), which means log storage costs grow linearly forever. Worse, the group ends up tagged/owned outside YOLO's normal sync convention — there's no AwsResources::logGroup() lookup pointing at it for future drift detection.
Severity: WARNING
Suggested Fix
Pre-create the log group explicitly via a new SyncTaskLogGroupStep (added to SyncFargateCommand::$steps before SyncTaskDefinitionStep), with a configurable retention from tasks.web.log-retention (default e.g. 30 days). Then flip awslogs-create-group: 'false' here so the driver doesn't shadow-own the group.
Prompt
Review the code at src/Steps/Fargate/SyncTaskDefinitionStep.php:52-62. A potential issue has been flagged: awslogs-create-group: 'true' lets the awslogs driver auto-create the CloudWatch log group with infinite retention, growing log storage costs forever. Propose a new SyncTaskLogGroupStep that pre-creates the group with a configurable retention (default 30 days) and disables awslogs-create-group here.
| 'cluster' => AwsResources::ecsClusterName(), | ||
| 'service' => AwsResources::ecsServiceName(), | ||
| 'desiredCount' => $desiredCount, | ||
| 'healthCheckGracePeriodSeconds' => $gracePeriod, |
There was a problem hiding this comment.
Finding: Drift detection is too narrow. Only desiredCount and healthCheckGracePeriodSeconds are diffed; everything else (taskDefinition revision, network config, capacity provider strategy, deployment config, execute-command toggle) is silently masked as SYNCED. Combined with SyncTaskDefinitionStep always registering a new revision per sync, this means: every sync churns a new task def revision that the service never adopts. Service stays pinned to whichever revision it was originally created with, accumulating an orphan trail of registered-but-unused revisions.
Also: the healthCheckGracePeriodSeconds ?? 0 coalesce will report drift on every sync when AWS returns a service without that field set (older create lineage) against a manifest default of 60 — silently churning updateService calls.
Severity: WARNING
Suggested Fix
Decide whether sync or deploy owns task-def revision adoption. If deploy (matches PR #26), add an explicit comment here: // task definition revision is managed by deploy command, not sync. Either way, widen the drift detection — at minimum check taskDefinition ARN against the latest revision, and consider extracting a payloadHasDifferences helper similar to Helpers::payloadHasDifferences() already used in SyncEc2SecurityGroupStep.
For the coalesce churn: pull healthCheckGracePeriodSeconds default from the manifest only once and compare against ?? of the same default, not 0.
Prompt
Review the code at src/Steps/Fargate/SyncEcsServiceStep.php:20-41 alongside SyncTaskDefinitionStep.php. A potential issue has been flagged: drift detection only covers desiredCount + healthCheckGracePeriodSeconds. The service never adopts new task definition revisions registered by SyncTaskDefinitionStep, accumulating orphan revisions. The `healthCheckGracePeriodSeconds ?? 0` coalesce also reports drift on every sync against older services. Verify intent (does deploy own revision adoption?) and either widen drift detection or document the create-only contract.
| } | ||
|
|
||
| $targetGroups = Aws::elasticLoadBalancingV2()->describeTargetGroups(); | ||
| $name = Helpers::keyedResourceName(exclusive: true); |
There was a problem hiding this comment.
Finding: Target group name lookup switched from Helpers::keyedResourceName(exclusive: false) (= yolo-{env}, shared across apps) to exclusive: true (= yolo-{env}-{app}, per-app). PR body confirms no current consumer relies on the old name. Two gotchas worth pinning:
- ELBv2 Name cap is 32 chars.
yolo-production-is 16 chars; that leaves 16 chars for the app name. Anything ≥17 chars hits AWS validation at create time with no upfront manifest precheck. Add validation inManifestor in this concern. - No regression test pins the new name. A 5-line test asserting the
Names => [...]arg shape passed todescribeTargetGroupswould document the intent and catch any future drift back to the old shared name.
Severity: WARNING
Suggested Fix
Add a manifest precheck (somewhere in Command::execute() or as a new Steps/Ensures/EnsureResourceNameLengths step):
if (strlen(Helpers::keyedResourceName(exclusive: true)) > 32) {
throw new IntegrityCheckException(sprintf(
"Target group name 'yolo-%s-%s' exceeds ELB's 32-char limit. Shorten the app name to %d chars or less.",
Helpers::environment(),
Manifest::name(),
32 - strlen("yolo-" . Helpers::environment() . "-")
));
}And a regression test under tests/Unit/Concerns/UsesElasticLoadBalancingV2Test.php pinning the lookup name shape.
Prompt
Review the code at src/Concerns/UsesElasticLoadBalancingV2.php:36-63. A potential issue has been flagged: the lookup naming changed from exclusive:false to exclusive:true. ELBv2's 32-char Name cap means app names ≥17 chars will fail at create time with no upfront precheck. Add a manifest validation pass (early-fail with a clear message) plus a regression test pinning the Names arg shape passed to describeTargetGroups.
| public function __invoke(array $options): StepResult | ||
| { | ||
| try { | ||
| $certificate = AwsResources::certificate(Manifest::apex()); |
There was a problem hiding this comment.
Finding: Manifest::apex() has a : string return type and will TypeError (not throw ResourceDoesNotExistException) when neither apex nor domain is set in the manifest. The try/catch only catches ResourceDoesNotExistException, so a domain-less Fargate-shaped manifest blows up the whole sync run with an unhandled TypeError instead of cleanly skipping. Same shape in SyncListenerRuleStep.php:19.
Severity: WARNING
Suggested Fix
Guard with a manifest presence check that returns SKIPPED:
if (! Manifest::has('apex') && ! Manifest::has('domain')) {
return StepResult::SKIPPED;
}OR change Manifest::apex() to throw a catchable IntegrityCheckException (or similar) when both keys are absent — but that's a wider blast radius affecting alpha-era code, so the per-step guard is the safer fix for this PR.
Prompt
Review the code at src/Steps/Fargate/SyncHttpsListenerStep.php:17-21 and src/Steps/Fargate/SyncListenerRuleStep.php:19-23. A potential issue has been flagged: Manifest::apex() return type is `: string` and will TypeError on domain-less manifests, but the try/catch only catches ResourceDoesNotExistException. Verify and propose either a Manifest::has() guard returning SKIPPED, or a wider change making Manifest::apex() throw a catchable exception.
`sync:fargate` ties the command name to the implementation when Fargate *is* the implementation. `sync:compute` slots cleanly next to `sync:network` / `sync:storage` / `sync:standalone` / `sync:iam` / `sync:logging` and reclaims the v1-alpha vocabulary so anyone reading old docs lands in the right place. Public surface is the command name — durable. Step namespace (`src/Steps/Fargate/`) stays Fargate-specific because the steps *are* concretely Fargate (SyncEcsClusterStep, SyncTargetGroupStep). If a second compute backend ever materialises, add `src/Steps/X/` and the command picks the step set by manifest shape; the user-facing command stays put. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Blockers:
- SyncTaskSecurityGroupStep: use createSecurityGroup response GroupId
directly instead of re-fetching through UsesEc2::securityGroups(),
which memoises pre-create state into a static property and would
throw `Could not find Security Group matching name yolo-...-ecs-task-
security-group` on every first-run sync.
- SyncCommand: gate SyncComputeCommand enrolment behind
`Manifest::has('tasks.web')` so existing non-Fargate consumers don't
silently get a brand-new ALB / ECS cluster / ECR repo provisioned on
their next `yolo sync`.
Warnings:
- SyncListenerRuleStep::nextAvailablePriority(): extract the pure walk
out of priority() and bound it to 1000-49999, wrap floor↔ceiling on
collision rather than escaping into the reserved <1000 range, and
bail with IntegrityCheckException after a full sweep instead of
spinning forever. 6 new unit tests pin the contract.
- SyncEcrRepositoryStep: putLifecyclePolicy after createRepository
with sensible defaults (expire untagged after 7d, keep last 30
tagged), overridable via `tasks.web.image-retention.{keep-count,
untagged-days}`.
- SyncTaskLogGroupStep (new): pre-create the CloudWatch log group
before the task definition is registered, with a configurable
retention (`tasks.web.log-retention`, default 30 days). Drops the
`awslogs-create-group: true` driver flag from the task definition so
the awslogs driver doesn't shadow-own the group.
- SyncEcsServiceStep: replace `?? 0` with `?? $gracePeriod` so older
services without `healthCheckGracePeriodSeconds` set don't churn an
updateService call on every sync. Comment makes the
sync-vs-deploy ownership of task-def revisions explicit.
- EnsureResourceNameLengthsStep (new): manifest precheck for AWS's
32-char Name cap on target groups + ECS services. Fails fast with
a clear message naming the offending key and the allowable length.
- SyncHttpsListenerStep + SyncListenerRuleStep: short-circuit to
SKIPPED when neither `apex` nor `domain` is set, instead of
TypeErroring on Manifest::apex()'s `: string` return type.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stacked on PR #25. Closes the loop from `sync:fargate` plumbing to a working end-to-end deploy: - `build` appends three Docker steps when the manifest sets `tasks.web`: ECR login (SDK token → `docker login --password-stdin`), `docker build` on the prepared `.yolo/build/` context, then push both `:{app-version}` and `:latest`. `tasks.web.dockerfile` overrides the default `Dockerfile` path; `tasks.web.platform` defaults to `linux/amd64` so Apple Silicon builds land on Fargate without `--platform` ceremony. - `deploy` registers a new task definition revision pinning `:{app-version}` (DRY-reuses `SyncTaskDefinitionStep::payload(?string)` with a new optional tag arg), then `updateService` with `forceNewDeployment: true` so ECS rolls the running tasks. `--watch` blocks on the `ServicesStable` waiter. Falls back to reading `.yolo/build/APP_VERSION` when `--app-version` isn't passed, so `yolo build production && yolo deploy production` Just Works. - Reuses existing `SyncStandaloneRecordSetStep` / `SyncMultitenancyRecordSetStep` for the ALB → Route 53 alias. `ChecksIfCommandsShouldBeRunning` already gates them on tenancy mode. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
stevethomas
left a comment
There was a problem hiding this comment.
Re-Review Summary (post-fix)
Verdict: ❌ Changes Requested
The first-round blockers + 6 warnings are addressed cleanly, but the re-review surfaced a new blocker that the original SG-cache fix didn't reach, plus a cluster of validation gaps that ride the same Manifest::get(..., default) | (int)/(bool) pattern.
Risk Assessment
| Category | Status |
|---|---|
| Data Integrity | ❌ Blocker (downstream stale-cache hit) |
| Security | |
| Breaking Changes | |
| Operational | |
| Test Coverage |
Blockers
src/Steps/Fargate/SyncEcsServiceStep.php:63— the original SG stale-cache fix only resolved the same-step exposure inSyncTaskSecurityGroupStep. The downstream consumerSyncEcsServiceStep's create branch callsAwsResources::ecsTaskSecurityGroup()['GroupId']for theawsvpcConfiguration.static::$ecsTaskSecurityGroupis unset (the catch in the SG step never seeded it), so it falls through tosecurityGroupByName()→securityGroups()which still returns the pre-create cached list. ThrowsResourceDoesNotExistException, aborts sync mid-create. First-runsync:computestill fails — just one step later.
Warnings (inline)
SyncEcsServiceStep.php:69—(bool) Manifest::get('tasks.web.enable-execute-command', false): PHP's(bool)cast on the YAML-quoted string"false"evaluates totrue, silently enabling ECS Exec (shell into running tasks). Security-sensitive footgun.SyncEcrRepositoryStep.php:42—(int)casts onkeep-count/untagged-days: garbage strings coerce to0(e.g.keep-count: thirty); floats truncate silently (keep-count: 1.7→1). AWS rejectscountNumber: 0server-side so it fails loud, but the failure surface is a deep AWS error rather than a clean preflight. Validate>= 1up front.SyncTaskLogGroupStep.php:18—tasks.web.log-retentioncast to int with no validation, but CloudWatch'sputRetentionPolicyonly accepts a fixed enum (1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, 2192, 2557, 2922, 3288, 3653). Reasonable-looking values like45or0throwInvalidParameterExceptionat sync time.src/Steps/Deploy/SyncStandaloneRecordSetStep.php:19—Manifest::apex()TypeError exposure re-opens throughDeployCommand(PR #26 stack base) for Fargate-only domain-less manifests. Same fix pattern as theSyncHttpsListenerStep/SyncListenerRuleStepguard already applied.stubs/yolo.yml.stub:1— original review WARNING #4 unaddressed. Still EC2/ASG-shaped withaws.autoscaling/aws.ec2blocks and notasks.web. A freshyolo initproduces a manifest wheresyncskips Fargate provisioning entirely.tests/Unit/Steps/Fargate/SyncListenerRuleStepTest.php:30— the "wraps from the ceiling back to the floor" test asserts< 49000 && >= 1000but doesn't prove the floor-jump branch executed. A name whose CRC-derived base lands below 49000 already passes via the unrelated baseline. Tighten by computing or seeding a name that forces base==49999, then assert result == 1000 exactly.
Coverage gaps shipped this PR
SyncTaskLogGroupStep(new) — three branches (existing+match, existing+drift, missing→create), zero tests. At minimum a payload-pinning test onlogGroupName().SyncEcrRepositoryStep::lifecyclePolicy()(new) — manifest-driven JSON shape, zero coverage. Mirror theSyncTaskDefinitionStepTestpattern.EnsureResourceNameLengthsStep(new) — boundary (32 chars exactly) not pinned. 3-case test would catch a future>↔>=flip.SyncEcsServiceStepupdate-vs-create (carry-over) — still zero coverage on the diff predicate + create payload shape.
Notes (no inline)
describeLogGroupsno pagination —findLogGroup()returns the first page only (default 50). Low-probability collision today but ergonomically annoying when it happens. Mirror ontoUsesCloudWatchLogs::logGroup()which has the same gap.SyncComputeCommanddirect invocation —yolo sync:compute productionagainst a non-tasks.webmanifest bypasses theSyncCommandgate and provisions Fargate infra with all defaults. Either guard the sub-command or accept as power-user drift.EnsureResourceNameLengthsStepformat check — covers 32-char length, not AWS Name format rules (alphanumeric + hyphens, no leading/trailing/consecutive hyphens). Out-of-format names still surface as deep AWS errors.Aws::tags(..., 'tags', associative: true)['tags']round-trip inSyncTaskLogGroupStep— wraps then immediately unwraps. Cleaner with a dedicatedAws::associativeTags()helper.- First-pass notes still applicable —
assignPublicIp: ENABLED,executionRoleArnbare default,imageTagMutability: MUTABLE, HTTPS certStatus !== 'ISSUED'fall-through — all unchanged, all tracked, no new exposure. - Carry-over coverage debt —
SyncTaskSecurityGroupSteptag-key drift coupling,Aws::ecsTags()no direct test, dry-run honesty across the suite. None blocking; worth a tracking issue.
🤖 Reviewed by Claude Code
| 'networkConfiguration' => [ | ||
| 'awsvpcConfiguration' => [ | ||
| 'subnets' => AwsResources::publicSubnetIds(), | ||
| 'securityGroups' => [AwsResources::ecsTaskSecurityGroup()['GroupId']], |
There was a problem hiding this comment.
Finding: The original blocker fix in SyncTaskSecurityGroupStep:64 (use createSecurityGroup response GroupId directly) only dodged the stale-cache exposure inside that step. This line calls AwsResources::ecsTaskSecurityGroup() to populate the service's awsvpcConfiguration. static::$ecsTaskSecurityGroup is never set in the catch path of the SG step, so the lookup falls through to securityGroupByName() → UsesEc2::securityGroups() which still returns the pre-create cached list (populated by the initial failed lookup on SyncTaskSecurityGroupStep:23). The newly-created SG isn't in that list, so this throws ResourceDoesNotExistException and aborts sync mid-create. First-run sync:compute still fails — just one step later.
Severity: BLOCKER
Suggested Fix
Seed the typed static cache inline after createSecurityGroup succeeds. Add a public setter to UsesEc2 (or a slim setEcsTaskSecurityGroup(array $sg) helper), then in SyncTaskSecurityGroupStep after the create + authorize:
$created = Aws::ec2()->createSecurityGroup([...]);
Aws::ec2()->authorizeSecurityGroupIngress(
static::loadBalancerIngressRule(['GroupId' => $created['GroupId']], $containerPort)
);
AwsResources::setEcsTaskSecurityGroup(['GroupId' => $created['GroupId']]);Downstream consumers only read ['GroupId'] so the minimal stub is sufficient. Belt-and-braces alternative is AwsResources::securityGroups(refresh: true) after create — costs an extra describe call but invalidates the whole pre-create cache. The targeted setter is cheaper.
Broader fix (kill the static caches entirely) is tracked in LPX-596.
Prompt
Review the code at src/Steps/Fargate/SyncEcsServiceStep.php#L63 alongside src/Concerns/UsesEc2.php#L158-L172 and src/Steps/Fargate/SyncTaskSecurityGroupStep.php#L45-L70. A potential issue has been flagged: the first-round SG cache fix dodged the stale-cache exposure inside SyncTaskSecurityGroupStep, but SyncEcsServiceStep's create branch re-resolves the same SG via AwsResources::ecsTaskSecurityGroup() which falls through to the still-stale UsesEc2::securityGroups() cache. Verify whether first-run sync:compute against a fresh environment will throw ResourceDoesNotExistException here. If real, propose seeding static::$ecsTaskSecurityGroup with the createSecurityGroup response GroupId via a setter on the trait.
| ], | ||
| 'loadBalancers' => [ | ||
| [ | ||
| 'targetGroupArn' => AwsResources::targetGroup()['TargetGroupArn'], |
There was a problem hiding this comment.
Finding: (bool) Manifest::get('tasks.web.enable-execute-command', false) — PHP's (bool) cast on the YAML-quoted string "false" evaluates to true. YAML's loose-typing rules mean a manifest line like enable-execute-command: "false" (quoted by accident or by editor auto-formatting) silently enables ECS Exec, granting shell access into running containers to anyone holding ecs:ExecuteCommand. Security-sensitive enough to warrant a strict parse.
Severity: WARNING
Suggested Fix
Use strict boolean parsing:
'enableExecuteCommand' => filter_var(
Manifest::get('tasks.web.enable-execute-command', false),
FILTER_VALIDATE_BOOLEAN,
FILTER_NULL_ON_FAILURE,
) === true,Or require === true so only the YAML boolean true enables it.
Prompt
Review src/Steps/Fargate/SyncEcsServiceStep.php#L69. A potential issue has been flagged: PHP's (bool) cast on the string "false" returns true, so a YAML-quoted enable-execute-command: "false" would silently enable ECS Exec. Verify whether the YAML loader prevents the misquote case; if not, propose strict boolean parsing.
| } | ||
| } | ||
|
|
||
| protected static function lifecyclePolicy(): string |
There was a problem hiding this comment.
Finding: (int) casts on tasks.web.image-retention.keep-count and untagged-days silently coerce garbage. keep-count: thirty → 0 (string → int). keep-count: 1.7 → 1 (float truncation). AWS server-side rejects countNumber: 0, so a typo fails noisily — but the failure surfaces as a deep AWS API error at sync time, not a clean preflight. Worth a lower-bound + numeric validation up front.
Severity: WARNING
Suggested Fix
Validate via filter_var with FILTER_VALIDATE_INT and a min_range option, throwing IntegrityCheckException on invalid input:
$keepCount = static::validatePositiveInt(
Manifest::get('tasks.web.image-retention.keep-count', 30),
'tasks.web.image-retention.keep-count',
);
$untaggedDays = static::validatePositiveInt(
Manifest::get('tasks.web.image-retention.untagged-days', 7),
'tasks.web.image-retention.untagged-days',
);Pair with a unit test pinning the validation contract (test coverage warning above).
Prompt
Review src/Steps/Fargate/SyncEcrRepositoryStep.php#L42. A potential issue has been flagged: (int) casts on keep-count and untagged-days accept non-numeric strings (→ 0), floats, negatives silently. Propose a positive-int validator that throws IntegrityCheckException with a clear message before the AWS round trip.
| public function __invoke(array $options): StepResult | ||
| { | ||
| $name = static::logGroupName(); | ||
| $retention = (int) Manifest::get('tasks.web.log-retention', 30); |
There was a problem hiding this comment.
Finding: tasks.web.log-retention is cast to int and passed straight to putRetentionPolicy, but the API only accepts a fixed enum: 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, 2192, 2557, 2922, 3288, 3653. Reasonable-looking values like 45, 28, or 0 throw InvalidParameterException at sync time with a generic AWS error rather than a friendly manifest-validation message. Same gap exists in the alpha-era SyncIvsCloudWatchLogGroupStep so a shared helper makes sense.
Severity: WARNING
Suggested Fix
Validate against the enum up front:
static $validRetentionDays = [1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, 2192, 2557, 2922, 3288, 3653];
$retention = (int) Manifest::get('tasks.web.log-retention', 30);
if (! in_array($retention, $validRetentionDays, true)) {
throw new IntegrityCheckException(sprintf(
'tasks.web.log-retention must be one of: %s (got %d)',
implode(', ', $validRetentionDays),
$retention,
));
}Extract to Helpers::validateCloudWatchLogRetention($value, $key) and call from both this step and SyncIvsCloudWatchLogGroupStep.
Prompt
Review src/Steps/Fargate/SyncTaskLogGroupStep.php#L18. A potential issue has been flagged: tasks.web.log-retention isn't validated against CloudWatch's fixed enum of accepted retention values, so values like 45 or 0 throw a generic InvalidParameterException at sync time. Propose a shared Helpers::validateCloudWatchLogRetention helper used here and in SyncIvsCloudWatchLogGroupStep.
The targeted dodge in SyncTaskSecurityGroupStep (use createSecurityGroup response GroupId directly) only fixed the in-step exposure. Downstream consumers — specifically SyncEcsServiceStep:63 calling AwsResources::ecsTaskSecurityGroup() for the awsvpcConfiguration — re-resolve the SG via securityGroupByName() → securityGroups(), which returned the pre-create cached list and threw ResourceDoesNotExistException. First-run sync:compute against a fresh environment still failed, just one step later. Strips static::\$securityGroups + the unused \$refresh parameter. The extra describeSecurityGroups call is ~50-100ms and only fires when a typed cache (loadBalancerSecurityGroup / ec2SecurityGroup / ecsTaskSecurityGroup / rdsSecurityGroup) misses, so the cost is bounded. Progressive strip per LPX-596 — the rest of the static caches stay until the broader cleanup, but \$securityGroups is the one currently biting first-run sync so it goes first. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stacked on PR #25. Closes the loop from `sync:fargate` plumbing to a working end-to-end deploy: - `build` appends three Docker steps when the manifest sets `tasks.web`: ECR login (SDK token → `docker login --password-stdin`), `docker build` on the prepared `.yolo/build/` context, then push both `:{app-version}` and `:latest`. `tasks.web.dockerfile` overrides the default `Dockerfile` path; `tasks.web.platform` defaults to `linux/amd64` so Apple Silicon builds land on Fargate without `--platform` ceremony. - `deploy` registers a new task definition revision pinning `:{app-version}` (DRY-reuses `SyncTaskDefinitionStep::payload(?string)` with a new optional tag arg), then `updateService` with `forceNewDeployment: true` so ECS rolls the running tasks. `--watch` blocks on the `ServicesStable` waiter. Falls back to reading `.yolo/build/APP_VERSION` when `--app-version` isn't passed, so `yolo build production && yolo deploy production` Just Works. - Reuses existing `SyncStandaloneRecordSetStep` / `SyncMultitenancyRecordSetStep` for the ALB → Route 53 alias. `ChecksIfCommandsShouldBeRunning` already gates them on tenancy mode. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three Manifest::get(...) values flow into AWS API payloads with
unchecked PHP casts that silently coerce garbage:
* (bool) Manifest::get('tasks.web.enable-execute-command') — PHP's
cast on the YAML-quoted string "false" returns true, silently
enabling ECS Exec (shell into running containers).
* (int) Manifest::get('tasks.web.image-retention.{keep-count,
untagged-days}') — non-numeric strings coerce to 0, floats truncate.
ECR rejects countNumber: 0 server-side so the failure is loud but
deep (deferred AWS API error rather than preflight).
* (int) Manifest::get('tasks.web.log-retention') — CloudWatch only
accepts a fixed enum (1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180,
365, 400, 545, 731, 1827, 2192, 2557, 2922, 3288, 3653). Values
like 45 or 28 throw InvalidParameterException at sync time.
Adds three helpers on Codinglabs\Yolo\Helpers:
* validatePositiveInt(value, key) — filter_var INT with min_range=1,
throws IntegrityCheckException naming the offending key + value.
* validateCloudWatchLogRetention(value, key) — enum check, throws
with the valid set listed.
* validateStrictBool(value, key) — filter_var VALIDATE_BOOLEAN with
NULL_ON_FAILURE; "false" string evaluates to false (not true),
garbage strings throw.
Used in SyncEcsServiceStep (enable-execute-command),
SyncEcrRepositoryStep (image-retention), SyncTaskLogGroupStep
(log-retention). lifecyclePolicy() promoted to public static so the
ECR payload shape is now also unit-testable (separate follow-up).
20 new tests pin the validators' behaviour incl. the "false"-string
trap the strict-bool helper exists to close.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stacked on PR #25. Closes the loop from `sync:fargate` plumbing to a working end-to-end deploy: - `build` appends three Docker steps when the manifest sets `tasks.web`: ECR login (SDK token → `docker login --password-stdin`), `docker build` on the prepared `.yolo/build/` context, then push both `:{app-version}` and `:latest`. `tasks.web.dockerfile` overrides the default `Dockerfile` path; `tasks.web.platform` defaults to `linux/amd64` so Apple Silicon builds land on Fargate without `--platform` ceremony. - `deploy` registers a new task definition revision pinning `:{app-version}` (DRY-reuses `SyncTaskDefinitionStep::payload(?string)` with a new optional tag arg), then `updateService` with `forceNewDeployment: true` so ECS rolls the running tasks. `--watch` blocks on the `ServicesStable` waiter. Falls back to reading `.yolo/build/APP_VERSION` when `--app-version` isn't passed, so `yolo build production && yolo deploy production` Just Works. - Reuses existing `SyncStandaloneRecordSetStep` / `SyncMultitenancyRecordSetStep` for the ALB → Route 53 alias. `ChecksIfCommandsShouldBeRunning` already gates them on tenancy mode. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DeployCommand unconditionally enrols this step, but Manifest::apex() has a `: string` return type and TypeErrors when neither `apex` nor `domain` is set in the manifest — a valid v1 configuration for a Fargate POC sitting behind the default ALB DNS. Mirrors the same SKIPPED early-return guard already applied to SyncHttpsListenerStep and SyncListenerRuleStep on PR #25. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…account lookups
Repro: `php vendor/bin/yolo sync:compute production --dry-run` against
a fresh AWS account (no ECS cluster yet) ran the first 9 steps cleanly
(ECR, cluster, task SG, ALB, target group, listeners, task def — all
WOULD_CREATE) then aborted at step 10 with an uncaught Guzzle 400:
{"__type":"ClusterNotFoundException","message":"Cluster not found."}
`UsesEcs::ecsService()` called `describeServices` with no try/catch,
so the SDK's `ClusterNotFoundException` (Cluster doesn't exist yet
on a cold account) escaped past the calling step's
`catch (ResourceDoesNotExistException)` branch — which is where the
`if (dry-run) return WOULD_CREATE` lives.
Mirrors the existing pattern in `ecsTaskDefinition()` at
src/Concerns/UsesEcs.php — translate any AwsException to
ResourceDoesNotExistException so the caller's dry-run + create branches
work.
Audited siblings under src/Concerns/Uses*.php:
- ecsTaskDefinition, ecrRepository, targetGroup, getParameter — already
wrapped, no change.
- loadBalancer, dbSubnetGroup, hostedZones, certificates, securityGroups,
vpc, internetGateway, routeTable, subnets, logGroup — enumerate then
fall through to ResourceDoesNotExistException naturally; no parent-
identifier reference that could 404.
- loadBalancerListenerOnPort, listenerCertificate — call sites guarantee
the parent (ALB / listener) exists before invocation; the protective
ResourceDoesNotExistException fires upstream.
Also drops `EnsureResourceNameLengthsStep` — AWS's native
`InvalidParameterValue: Target group name cannot exceed 32 characters`
error is already actionable. The step was a busywork pre-flight that
added noise to sync output without meaningful ergonomic gain.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stacked on PR #25. Closes the loop from `sync:fargate` plumbing to a working end-to-end deploy: - `build` appends three Docker steps when the manifest sets `tasks.web`: ECR login (SDK token → `docker login --password-stdin`), `docker build` on the prepared `.yolo/build/` context, then push both `:{app-version}` and `:latest`. `tasks.web.dockerfile` overrides the default `Dockerfile` path; `tasks.web.platform` defaults to `linux/amd64` so Apple Silicon builds land on Fargate without `--platform` ceremony. - `deploy` registers a new task definition revision pinning `:{app-version}` (DRY-reuses `SyncTaskDefinitionStep::payload(?string)` with a new optional tag arg), then `updateService` with `forceNewDeployment: true` so ECS rolls the running tasks. `--watch` blocks on the `ServicesStable` waiter. Falls back to reading `.yolo/build/APP_VERSION` when `--app-version` isn't passed, so `yolo build production && yolo deploy production` Just Works. - Reuses existing `SyncStandaloneRecordSetStep` / `SyncMultitenancyRecordSetStep` for the ALB → Route 53 alias. `ChecksIfCommandsShouldBeRunning` already gates them on tenancy mode. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DeployCommand unconditionally enrols this step, but Manifest::apex() has a `: string` return type and TypeErrors when neither `apex` nor `domain` is set in the manifest — a valid v1 configuration for a Fargate POC sitting behind the default ALB DNS. Mirrors the same SKIPPED early-return guard already applied to SyncHttpsListenerStep and SyncListenerRuleStep on PR #25. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors PR #25's SyncCommand gate. Without it, a non-Fargate consumer running `yolo deploy production` would unconditionally execute the six Fargate-flavoured steps — registering a stray ECS task definition revision, then crashing at UpdateEcsServiceStep with ServiceNotFoundException. Partial state + opaque error. The gate matches the existing entry-message pattern (early `error()` + FAILURE return) so the failure surface is operator-friendly: clear message, no AWS-side side effects. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`/healthy` was a carry-over from the alpha lineage with no convention behind it. `/health` is the generic HTTP health-endpoint convention that doesn't tie YOLO's default to a specific Laravel version. (Laravel 11+ ships `/up` as its native default; consumers can override via `tasks.web.health-check.path`.) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stacked on PR #25. Closes the loop from `sync:fargate` plumbing to a working end-to-end deploy: - `build` appends three Docker steps when the manifest sets `tasks.web`: ECR login (SDK token → `docker login --password-stdin`), `docker build` on the prepared `.yolo/build/` context, then push both `:{app-version}` and `:latest`. `tasks.web.dockerfile` overrides the default `Dockerfile` path; `tasks.web.platform` defaults to `linux/amd64` so Apple Silicon builds land on Fargate without `--platform` ceremony. - `deploy` registers a new task definition revision pinning `:{app-version}` (DRY-reuses `SyncTaskDefinitionStep::payload(?string)` with a new optional tag arg), then `updateService` with `forceNewDeployment: true` so ECS rolls the running tasks. `--watch` blocks on the `ServicesStable` waiter. Falls back to reading `.yolo/build/APP_VERSION` when `--app-version` isn't passed, so `yolo build production && yolo deploy production` Just Works. - Reuses existing `SyncStandaloneRecordSetStep` / `SyncMultitenancyRecordSetStep` for the ALB → Route 53 alias. `ChecksIfCommandsShouldBeRunning` already gates them on tenancy mode. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DeployCommand unconditionally enrols this step, but Manifest::apex() has a `: string` return type and TypeErrors when neither `apex` nor `domain` is set in the manifest — a valid v1 configuration for a Fargate POC sitting behind the default ALB DNS. Mirrors the same SKIPPED early-return guard already applied to SyncHttpsListenerStep and SyncListenerRuleStep on PR #25. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors PR #25's SyncCommand gate. Without it, a non-Fargate consumer running `yolo deploy production` would unconditionally execute the six Fargate-flavoured steps — registering a stray ECS task definition revision, then crashing at UpdateEcsServiceStep with ServiceNotFoundException. Partial state + opaque error. The gate matches the existing entry-message pattern (early `error()` + FAILURE return) so the failure surface is operator-friendly: clear message, no AWS-side side effects. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stacked on PR #25. Closes the loop from `sync:fargate` plumbing to a working end-to-end deploy: - `build` appends three Docker steps when the manifest sets `tasks.web`: ECR login (SDK token → `docker login --password-stdin`), `docker build` on the prepared `.yolo/build/` context, then push both `:{app-version}` and `:latest`. `tasks.web.dockerfile` overrides the default `Dockerfile` path; `tasks.web.platform` defaults to `linux/amd64` so Apple Silicon builds land on Fargate without `--platform` ceremony. - `deploy` registers a new task definition revision pinning `:{app-version}` (DRY-reuses `SyncTaskDefinitionStep::payload(?string)` with a new optional tag arg), then `updateService` with `forceNewDeployment: true` so ECS rolls the running tasks. `--watch` blocks on the `ServicesStable` waiter. Falls back to reading `.yolo/build/APP_VERSION` when `--app-version` isn't passed, so `yolo build production && yolo deploy production` Just Works. - Reuses existing `SyncStandaloneRecordSetStep` / `SyncMultitenancyRecordSetStep` for the ALB → Route 53 alias. `ChecksIfCommandsShouldBeRunning` already gates them on tenancy mode. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DeployCommand unconditionally enrols this step, but Manifest::apex() has a `: string` return type and TypeErrors when neither `apex` nor `domain` is set in the manifest — a valid v1 configuration for a Fargate POC sitting behind the default ALB DNS. Mirrors the same SKIPPED early-return guard already applied to SyncHttpsListenerStep and SyncListenerRuleStep on PR #25. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors PR #25's SyncCommand gate. Without it, a non-Fargate consumer running `yolo deploy production` would unconditionally execute the six Fargate-flavoured steps — registering a stray ECS task definition revision, then crashing at UpdateEcsServiceStep with ServiceNotFoundException. Partial state + opaque error. The gate matches the existing entry-message pattern (early `error()` + FAILURE return) so the failure surface is operator-friendly: clear message, no AWS-side side effects. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(LPX-576, LPX-579): build + deploy commands for first Fargate POC Stacked on PR #25. Closes the loop from `sync:fargate` plumbing to a working end-to-end deploy: - `build` appends three Docker steps when the manifest sets `tasks.web`: ECR login (SDK token → `docker login --password-stdin`), `docker build` on the prepared `.yolo/build/` context, then push both `:{app-version}` and `:latest`. `tasks.web.dockerfile` overrides the default `Dockerfile` path; `tasks.web.platform` defaults to `linux/amd64` so Apple Silicon builds land on Fargate without `--platform` ceremony. - `deploy` registers a new task definition revision pinning `:{app-version}` (DRY-reuses `SyncTaskDefinitionStep::payload(?string)` with a new optional tag arg), then `updateService` with `forceNewDeployment: true` so ECS rolls the running tasks. `--watch` blocks on the `ServicesStable` waiter. Falls back to reading `.yolo/build/APP_VERSION` when `--app-version` isn't passed, so `yolo build production && yolo deploy production` Just Works. - Reuses existing `SyncStandaloneRecordSetStep` / `SyncMultitenancyRecordSetStep` for the ALB → Route 53 alias. `ChecksIfCommandsShouldBeRunning` already gates them on tenancy mode. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: preserve yolo.yml deploy + deploy-all blocks for Fargate PR #26's first cut silently dropped manifest deploy: / deploy-all: execution — the alpha-era StartCommand ran them server-side via CodeDeploy hooks, and the new DeployCommand had no equivalent. - ExecuteDeployStepsStep (new) runs the `deploy:` block as a one-off ECS RunTask on the just-registered task definition revision, before the service is rolled. The container CMD is overridden to `sh -c "set -e; <commands>"` so a failed migration aborts the deploy with the service still on the previous revision (no traffic shift, rollback is a no-op). - GenerateEntrypointScriptStep (new) writes `.yolo/build/.yolo-entrypoint.sh` during build. The script runs each `deploy-all:` command then `exec "$@"`. The consumer's Dockerfile wires it as `ENTRYPOINT ["/app/.yolo-entrypoint.sh"]` so per-container hooks (artisan optimize etc.) fire on every task start without YOLO reaching into the container at runtime. deploy: → one-off RunTask (runs once per deploy, before traffic switch) deploy-all: → entrypoint wrapper (runs on every container start) Manifest abstraction preserved end-to-end. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: guard SyncStandaloneRecordSetStep against domain-less manifests DeployCommand unconditionally enrols this step, but Manifest::apex() has a `: string` return type and TypeErrors when neither `apex` nor `domain` is set in the manifest — a valid v1 configuration for a Fargate POC sitting behind the default ALB DNS. Mirrors the same SKIPPED early-return guard already applied to SyncHttpsListenerStep and SyncListenerRuleStep on PR #25. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: gate `yolo deploy` behind Manifest::has('tasks.web') Mirrors PR #25's SyncCommand gate. Without it, a non-Fargate consumer running `yolo deploy production` would unconditionally execute the six Fargate-flavoured steps — registering a stray ECS task definition revision, then crashing at UpdateEcsServiceStep with ServiceNotFoundException. Partial state + opaque error. The gate matches the existing entry-message pattern (early `error()` + FAILURE return) so the failure surface is operator-friendly: clear message, no AWS-side side effects. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: gate BuildCommand's Docker steps on Manifest::has('tasks'), not 'tasks.web' Docker build + ECR push are task-type-agnostic — the same Laravel-in-a- container image serves whatever task type the deploy targets. The runtime difference (octane:start vs queue:work vs schedule:run) lives in the task definition's `command` override, not in the image. The original `Manifest::has('tasks.web')` gate meant queue- or scheduler-only apps got the base build pipeline (rsync, composer install, npm) but no image to deploy. Broadening to `Manifest::has( 'tasks')` future-proofs without changing today's behaviour (web is still the only task type in the schema). DeployCommand's gate stays at `tasks.web` deliberately — today's DeployCommand is web-specific (it updates the ECS web service). Queue and scheduler deploys will branch or get their own commands when those task types land (LPX-580+). 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…out a public face (#29) * feat(LPX-604): headless mode — skip ALB/listeners/Route53 when no public face Detect headless apps from manifest shape (no `domain`, no `apex`, no tenant declares either) and skip every step that touches a public- facing AWS resource: ALB, target group, HTTP/HTTPS listeners, listener rule, hosted zone, SSL cert, Route 53 record set. The ECS service still runs without a load balancer association. Use cases: queue worker pools, scheduler-only apps, internal services with no public surface. Surface: - `Manifest::isHeadless()` — true when neither `domain`/`apex` is set on the solo manifest, and (for multitenant) every tenant entry lacks both keys. - `Contracts/ExecutesWebStep` — new marker for ALB-side steps. Applied to SyncLoadBalancerStep, SyncTargetGroupStep, SyncHttpListenerStep, SyncHttpsListenerStep, SyncListenerRuleStep. - `ExecutesDomainStep` finally wired into the gate (was an unused marker, flagged in the PR #26 review). Applied to SyncSoloRecordSetStep alongside its existing ExecutesSoloStep. - `ChecksIfCommandsShouldBeRunning` skips both markers under headless. - `SyncEcsServiceStep` create/update payloads drop `loadBalancers` and `healthCheckGracePeriodSeconds` when headless. Extracted `needsUpdate()` / `createPayload()` / `updatePayload()` as pure helpers — also closes the test-coverage gap the PR #25 review flagged for this step. - `Manifest::tenants()` no longer TypeErrors on a headless tenant entry — apex normalisation gracefully resolves to null instead of bombing on `str_starts_with(null, ...)`. 16 new unit tests pin the headless detector matrix, the SyncEcsServiceStep drift predicate (incl. carry-over coverage gap from PR #25 review), the headless conditional payload shape, and the tenants() normaliser's headless tolerance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(LPX-604): unify shouldBeRunning gate into one compound skip block Three step-skip conditions (solo-in-multitenant, multitenancy-in-solo, public-step-in-headless) sat as two separate blocks. Folding into one compound `if` matches the shape of the check (all are "skip when manifest mode disagrees with the marker") and makes the skip surface trivially extensible for the next marker we add. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(LPX-604): collapse ExecutesDomainStep into ExecutesWebStep Two markers for "stuff that only matters with a public face" was YAGNI — no real step is web-but-not-domain or domain-but-not-web. Headless mode skips the lot uniformly. * ExecutesWebStep is now the sole public-facing marker. * ExecutesDomainStep contract deleted. * SyncHostedZoneStep, SyncSslCertificateStep, SyncSoloRecordSetStep re-tagged with ExecutesWebStep. * ChecksIfCommandsShouldBeRunning skip-clause halves in size. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(LPX-604): split headless gate from the tenancy-mode gate The previous compound-skip block lumped three different concerns into one OR-chain. Tenancy-mode mismatch (solo step in mt app, mt step in solo app) and "is this a public-facing step in a headless app" are orthogonal — readability wins from separation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: guard AWS account mismatch in Command::execute Before any AWS-touching command runs handle(), verify the manifest's aws.account-id matches sts:GetCallerIdentity for the resolved profile. Catches the foot-gun where YOLO_<ENV>_AWS_PROFILE points at the wrong account and the manifest's ID gets used silently for ARN interpolation. One STS call per command run (~50ms). Skips when the manifest doesn't declare an account (legacy / test fixtures). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: throw IntegrityCheckException from the account-mismatch guard `ensureManifestAccountMatchesProfile()` returned bool + called error() itself, while every other manifest-validation failure in the codebase (Manifest::apex, Helpers::validate*, EnsureEnvIsConfiguredCorrectlyStep, SyncListenerRuleStep::nextAvailablePriority, the SG-rule integrity guards) throws IntegrityCheckException. Aligning with the prevailing pattern: void return, throw on either STS lookup failure or account mismatch, Symfony's exception renderer handles the styled output for free. DX wins: single responsibility (guard either succeeds or throws), the caller drops the `if (! ...) return 1` shim, stack trace available with -vvv for debugging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: revert account guard to bool+error() pattern, add profileAccountId helper Two pieces of feedback: 1. Throwing IntegrityCheckException from Command::execute() broke the prevailing pattern at that layer — every other early gate (missing manifest, wrong environment, missing AWS_PROFILE) uses `error() + return 1`. The IntegrityCheckException throws all live deeper, in Steps and Manifest helpers. Reverted to bool return so the four early gates look identical. 2. Was reaching past Aws::accountId() to call Manifest::get() directly. Plus the STS side had no helper. Added Aws::profileAccountId() (named for the .env profile rather than 'caller', since the value is whatever YOLO_<ENV>_AWS_PROFILE resolves to) so both sides go through Aws::*. Presence check now Manifest::has('aws.account-id') rather than the truthy-check on Aws::accountId() — the latter has a `: string` return type that TypeErrors when the key is absent, and widening to `?string` would cascade into every existing caller using sprintf/concat. InitCommand carve-out at Command::execute() short-circuits before registerAwsServices(), so new apps don't hit the guard before they've written a manifest. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: require aws.account-id in the manifest The "skip when not declared" early-return was a silent foot-gun — same one the whole guard exists to prevent. A manifest missing aws.account-id would happily run against whatever YOLO_<ENV>_AWS_PROFILE resolves to, with no safety net. Now bails with a clear "must declare aws.account-id under environments.<env>" message. InitCommand short-circuits before this guard, so new apps run `yolo init` without tripping it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: require name + aws.region in manifest integrity check Split the manifest-vs-AWS reconciliation into two early gates in Command::execute(): * ensureManifestIntegrity() runs *before* registerAwsServices(). Bails when any of `name`, `aws.region`, `aws.account-id` is missing. `aws.region` is the load-bearing one — RegistersAws feeds it into every SDK client at construction time, so a null region collapses into a deep AWS SDK error. `name` drives every keyedResourceName() and silently produces invalid AWS resource names (trailing hyphen) if absent. account-id presence is now also in this group. * ensureAccountMatchesProfile() runs *after* registerAwsServices() and STS-verifies the manifest's account-id against YOLO_<ENV>_AWS_PROFILE. Renamed from ensureManifestAccountMatches- Profile (the manifest part is implied now that integrity runs upstream). Test split: CommandManifestIntegrityTest (no AWS mocks needed) and CommandAccountGuardTest (STS mock). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: extract ensureNameDeclared + ensureManifestKeyDeclared ensureManifestIntegrity() reads as a series of named gates short- circuited via && rather than a foreach loop with implicit semantics. Each extracted method tells you what it's checking — the loop was clever but obscured the per-key checks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: collapse ensureNameDeclared into ensureManifestKeyDeclared Manifest::has() now falls back to top-level when an env-scoped lookup misses. The dedicated ensureNameDeclared() method + its bespoke error message were the only special-case for top-level keys; collapsing both down keeps the integrity check a flat list of `Manifest::has` lookups. Error message dropped the "under environments.<env>" qualifier — works uniformly for top-level `name` and env-scoped `aws.*` keys. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: revert Manifest::has widening, keep ensureNameDeclared separate Widening Manifest::has() to fall back to top-level was solving one caller's problem by relaxing the contract of a widely-used helper — opens subtle ambiguity if a future env-scoped key ever shares a name with a top-level one. Reverted; ensureNameDeclared() handles the top-level `name` lookup directly via Manifest::current(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: drop comments that restate code routedHosts() comment block restated the ternary, and the tenants normaliser comment took three lines to caption a one-line null coalesce. Both code paths read for themselves. Kept comments elsewhere that explain non-obvious *why* — un-memoised SG cache (UsesEc2), AwsException translation (UsesEcs), narrow service drift detection (SyncEcsServiceStep), headless tenant raw read (Manifest::isHeadless). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: drop Manifest comments that restated code * `// prefer the apex key when specified` — restated what `get(key, default)` already does. * `// apex resolves to null for headless tenants` — the `?? ($config['domain'] ?? null)` chain already shows it. Kept the one-line `// Read raw — tenants() normaliser TypeErrors on a headless tenant.` in isHeadless() — that's a real why-not-use-tenants() signal that disappears without it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
What this ships
sync:compute— the full provisioning surface for Fargate-shaped YOLO apps.Tracking: LPX-578.
TL;DR
SyncCommandonly enrolsSyncComputeCommandwhen the manifest hastasks.web— non-Fargate consumers stay unchanged.Steps in
sync:compute(in order)SyncEcrRepositoryStep— creates ECR repo + attaches a lifecycle policy. Defaults: expire untagged after 7d, keep last 30 tagged. Overridable viatasks.web.image-retention.{keep-count,untagged-days}.SyncEcsClusterStep— Fargate + FARGATE_SPOT capacity providers, container insights enabled.SyncTaskSecurityGroupStep— task ENI security group with inbound from the ALB SG only.SyncLoadBalancerStep— internet-facing ALB on the public subnets, looked up byaws.albor defaulted to the env-shared nameyolo-{env}.SyncTargetGroupStep— IP-target group with configurable health check.SyncHttpListenerStep— HTTP:80 → HTTPS:443 permanent redirect.SyncHttpsListenerStep— HTTPS:443 with ACM cert attached. ReturnsSKIPPEDcleanly when the cert isn'tISSUEDyet (re-sync after DNS validation completes) or when noapex/domainis set.SyncListenerRuleStep— host-based routing rule with bounded priority assignment in the 1000-49999 range. PurenextAvailablePriority()extracted and unit-tested for wrap, exhaustion, determinism, range.SyncTaskLogGroupStep— pre-creates the CloudWatch log group with configurable retention (tasks.web.log-retention, default 30d).SyncTaskDefinitionStep— Fargate-compatible task definition (awsvpc, configurable cpu/memory).SyncEcsServiceStep— ECS service with rolling deployment config (min 100% / max 200%). Drift detection coversdesiredCount+healthCheckGracePeriodSeconds. Task-definition revision adoption is owned byyolo deploy, not sync.Manifest shape
Manifest value validation
Three values flow into AWS API payloads through strict validators that throw
IntegrityCheckExceptionwith a clear message on bad input, before any AWS round trip:tasks.web.enable-execute-command—Helpers::validateStrictBool. The YAML-quoted string"false"parses asfalse(PHP's native(bool)cast would returntrue).tasks.web.image-retention.{keep-count,untagged-days}—Helpers::validatePositiveInt. Non-numeric strings rejected (not coerced to0).tasks.web.log-retention—Helpers::validateCloudWatchLogRetention. Restricted to CloudWatch's accepted enum (1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, 2192, 2557, 2922, 3288, 3653).Other AWS plumbing
Aws::ecr()/Aws::ecs()SDK client registrations.Aws::ecsTags()helper for the lowercasekey/valuetag shape ECS expects.UsesEcr/UsesEcsconcerns mixed intoAwsResources.publicSubnetIds()onUsesEc2.SecurityGroup::ECS_TASK_SECURITY_GROUP+SecurityGroupRule::ECS_TASK_LB_INGRESS_RULEenum cases.UsesElasticLoadBalancingV2::targetGroup()uses per-app naming (yolo-{env}-{app}) so multiple apps can share one ALB with separate TGs.UsesEc2::securityGroups()is intentionally un-memoised — steps frequently describe → catch-not-found → create → re-describe in a single sync and a stale cache silently breaks the re-read.UsesEcs::ecsService()andecsTaskDefinition()translate anyAwsException(incl.ClusterNotFoundExceptionon cold accounts) into the project's standardResourceDoesNotExistExceptionso the calling step's catch handles dry-run + create cleanly.Known follow-ups (separate tickets)
UsesEc2/UsesEcs/UsesEcr/UsesElasticLoadBalancingV2.ecsTaskExecutionRole+ per-app task roles.destroy:computeteardown command.Reviewer notes
tasks.web.execution-roledefaults to the bare'ecsTaskExecutionRole'role name; the AWS-managed role must exist in the target account orregisterTaskDefinitionfails.provisioningfor ~2-5 min after creation. AWS accepts listener creates against a provisioning ALB, so sync proceeds — butsync:computereturns before the ALB is actually serving traffic.:latestreference doesn't resolve untilyolo buildhas pushed an image. PR LPX-576 + LPX-579: build + deploy commands for first Fargate POC #26 ships the build + deploy half.