Skip to content

feat(project): add task types for pr review and pr iteration#11

Merged
krokoko merged 5 commits into
mainfrom
pr-iteration
Apr 9, 2026
Merged

feat(project): add task types for pr review and pr iteration#11
krokoko merged 5 commits into
mainfrom
pr-iteration

Conversation

@krokoko
Copy link
Copy Markdown
Contributor

@krokoko krokoko commented Apr 9, 2026

Related

Changes

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

@krokoko krokoko requested a review from a team as a code owner April 9, 2026 01:41
@krokoko krokoko merged commit 210ee6e into main Apr 9, 2026
5 of 6 checks passed
@krokoko krokoko deleted the pr-iteration branch April 12, 2026 17:08
scoropeza pushed a commit to scoropeza/sample-autonomous-cloud-coding-agents that referenced this pull request May 5, 2026
…-user_id validator (krokoko review aws-samples#7, aws-samples#11)

Two small hardening changes paired under the theme "surface silent
failures at construction time / at run time". Both make existing
latent bugs visible instead of silently breaking at the worst
possible moment.

## Findings addressed

**aws-samples#7 — Reconciler silently succeeds when ALL transitions fail**

``reconcile-stranded-tasks`` previously logged per-task failures at
WARN and returned success unconditionally. A systemic failure (DDB
throttling at the shard level, IAM outage, schema corruption) could
strand 100% of candidates — and the only signal was a WARN log per
task plus a final INFO that looked like a healthy run. Operators
dashboarding "reconciler completed" counts would not notice the
outage.

New behaviour classifies the run result into three cases and picks
a log level accordingly:

  - ``stranded > 0 AND failed == 0 AND errors > 0`` → ERROR with
    ``error_id: 'RECONCILER_TOTAL_FAILURE'``. Systemic failure; alarm
    on the error_id string.
  - ``errors > 0 AND failed > 0`` → WARN with
    ``error_id: 'RECONCILER_PARTIAL_FAILURE'``. Dashboards signal;
    not alarm-worthy on its own.
  - Otherwise → INFO, as today.

The handler still does NOT throw — event-source-mapping invocations
complete normally. The log-level escalation IS the alarm signal,
matching the ``error_id`` convention already used in
``fanout-task-events.ts`` (``FANOUT_GITHUB_PERSIST_FAILED``).

**aws-samples#11 — TaskConfig missing @model_validator for trace=True + user_id=""**

The trace trajectory is uploaded to
``traces/<user_id>/<task_id>.jsonl.gz`` (design §10.1), and the
``get-trace-url`` handler refuses presigned keys outside the
caller's own ``traces/<user_id>/`` prefix. Pre-fix, a TaskConfig
built with ``trace=True`` and an empty ``user_id`` sentinel would
construct fine and fail later at S3 upload time — mid-task, when
the agent had already paid the cost of running.

Added a ``@model_validator(mode='after')`` on ``TaskConfig`` that
raises a descriptive ``ValueError`` when ``trace=True`` and
``user_id`` is empty. Construction fails immediately; local/dev
misconfigurations surface before the agent wastes tokens.

The error message cites design §10.1 + the get-trace-url handler's
prefix guard so the remedy is clear without cross-referencing other
files.

## Tests

**CDK reconciler (+4 tests):**
  - Total-failure case logs ERROR + ``RECONCILER_TOTAL_FAILURE``.
  - Partial-failure case logs WARN + ``RECONCILER_PARTIAL_FAILURE``.
  - Full-success case logs INFO (happy-path regression).
  - Empty-candidate case logs INFO (not alarming — absence of
    stranded tasks is the target state).

CDK suite: 1036 passing (was 1032).

**Agent TaskConfig (+3 tests, +2 pipeline realignments):**
  - ``test_trace_true_with_empty_user_id_raises_at_construction`` —
    validator fires immediately with the documented message fragment.
  - ``test_trace_true_with_valid_user_id_constructs_cleanly`` —
    happy-path regression.
  - ``test_trace_false_allows_empty_user_id`` — negative control;
    local/batch runs without an orchestrator still work as long as
    they do not opt into trace capture.
  - Two existing ``test_pipeline.py`` tests constructed
    ``TaskConfig(trace=True, user_id="")`` directly. These now
    either (a) pass a real ``user_id`` for the happy path, or (b)
    assert that construction raises — the tightened contract is
    strictly stronger than the previous "defensive-at-upload skip".

Agent suite: 500 passing (was 498; +3 new, −1 obsolete, +0 net from
pipeline realignment).

Refs: krokoko code review on PR aws-samples#52 (findings 7, 11)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant