feat(linear): add Linear integration, with Slack review fixes#63
Conversation
Adds full Slack integration enabling users to submit coding tasks by mentioning @shoof in any channel or DM, with real-time emoji reactions and threaded notifications showing task progress. Key features: - @mention task submission with natural language repo extraction - Emoji reaction progression: 👀 → ⌛ → ✅ - Threaded notifications (created, started, completed/failed) - Cancel button with instant feedback - DM support for private task submissions - OAuth multi-workspace install flow - `bgagent slack setup` CLI wizard for zero-friction onboarding - Account linking via /bgagent link New files: - CDK constructs: SlackIntegration, SlackInstallationTable, SlackUserMappingTable - Lambda handlers: events, commands, command-processor, interactions, oauth-callback, link, notify - Shared utilities: slack-verify, slack-blocks - CLI: slack.ts (setup, link commands) - Docs: SLACK_SETUP_GUIDE.md with screenshots
# Conflicts: # docs/guides/DEVELOPER_GUIDE.md # docs/guides/USER_GUIDE.md # docs/src/content/docs/developer-guide/Installation.md # docs/src/content/docs/developer-guide/Repository-preparation.md # docs/src/content/docs/user-guide/Overview.md
Add SLACK_SETUP_GUIDE.md to the explicit route map in sync-starlight.mjs, mirror it to using/Slack-setup-guide.md, and wire it into the astro sidebar. Without this, links to the Slack setup guide rendered as /architecture/... and the page itself never appeared on the site. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…1-20) Full response to the 20-point review on PR aws-samples#42. Grouped by theme below rather than strictly by item number so related edits stay together. **Signature + auth (items 2, 3, 9)** - Item 2: url_verification challenge used to be answered before signature verification, exposing the endpoint as a Slack handler without auth. Now every request is verified first. During initial setup — before the signing secret is populated — url_verification is still answered so the Slack App can be wired up, but every other request is rejected. - Item 3: previously any request with X-Slack-Retry-Num was acked with 200 immediately. Transient failures on app_uninstalled or tokens_revoked silently dropped the revocation and left the bot token live forever. Security-critical event retries now reprocess; UX events still short- circuit on retry. - Item 9: slack-verify's 5-minute secret cache had no invalidation. After a signing-secret rotation, warm Lambdas rejected every request for up to 5 minutes. New verifySlackRequest helper does verify-then-retry with forced refresh; invalidateSlackSecretCache exported for explicit eviction. All three authed handlers (events, commands, interactions) go through verifySlackRequest. **Error surfacing (items 4, 7, 20)** - Item 4: slack-notify caught every error per-record and logged warn, which made retryAttempts and bisectBatchOnError useless. Now distinguishes SlackApiError (swallow — not retryable via stream replay) from infra errors (rethrow so Lambda retries the batch). - Item 7: six .catch(() => {}) swallows in slack-events and slack-command-processor replaced with a shared slackFetch helper in shared/slack-api.ts that logs at warn. Benign Slack errors (already_reacted, no_reaction) are still silenced as expected. - Item 20: safeJsonParse in slack-notify logs the parse error and the truncated payload preview instead of swallowing silently. **Revocation ordering (item 6)** - revokeInstallation previously deleted the bot-token secret even if the DDB update to mark the installation revoked had failed, leaving DB status "active" with no backing token. Now the secret delete runs only if the DDB update succeeded; both steps throw on failure so Slack retries the event end-to-end. **UX fallbacks (items 8, 10)** - Item 8: when the command-processor Lambda invocation fails from handleAppMention, swap 👀 → ❌ and reply in-thread, mirroring the no-repo-found path. Previously the user was left with a stuck 👀. - Item 10: checkChannelAccess fails closed on definitively-blocking Slack errors (channel_not_found, not_in_channel, missing_scope). Transient or unknown errors (ratelimited, internal_error, network blips) fail open so a Slack outage doesn't block task submission; slack-notify surfaces any real delivery failure downstream. **Dedup ordering (item 17)** - Moved the channel_source === 'slack' check before the terminal dedup write so we don't poke non-Slack task records' channel_metadata with a Slack-specific marker. **Dead code + type safety (items 1, 13, 14)** - Item 1: delete unused handleStatus / handleCancel / statusEmoji from slack-command-processor. They were fully implemented but never wired into the switch(subcommand). Sam's PR aws-samples#52 roadmap delivers `bgagent status` via the CLI; the Slack Cancel button already exists. - Item 13: MentionPayload used to extend SlackCommandPayload and set phantom fields (`response_url: ''`, etc.). Refactored into a discriminated union `CommandProcessorEvent = SlashCommandEvent | MentionEvent`. slack-events.ts constructs MentionEvent directly; handleSubmit takes MentionEvent only. - Item 14: SlackBlock was a single interface requiring `as unknown as SlackBlock` to construct actions blocks. Now SectionBlock | ActionsBlock discriminated union with typed button variants. Casts removed; tests updated with narrowing helpers. **Shared utilities (items 15, 16, 19)** - Items 15/16: extract ChannelSource union from shared/types.ts and use it on both TaskCreationContext.channelSource and TaskRecord.channel_source. Extract truncate() and formatDuration() into shared/slack-format.ts so they stop drifting between slack-blocks and slack-command-processor. - Item 19: formatDuration now coerces via Number() for all callers, fixing the inconsistency where taskTimedOutMessage skipped the cast. **CLI error surfacing (item 11)** - cli/src/commands/slack.ts:getStackOutput used to swallow every CloudFormation error as "not deployed yet". Now only swallows ValidationError / does-not-exist; auth and permission errors bubble up to the user. **Excess IAM grant (item 18)** - slackCommandsFn had readSlackSecretsPolicy attached, but the slash- command acknowledger only needs the signing secret (already granted separately). Grant removed; comment added explaining the intent. **OAuth CSRF (item 5)** - Documented the tradeoff in slack-oauth-callback.ts. Slack's client_secret requirement on the code exchange is the primary defense; a full state nonce requires per-session signed storage and is tracked as a follow-up coordinated with PR aws-samples#52's dispatcher rework. **Not addressed in this PR (architectural feedback for follow-up)** Alain's "What I would like to see" items 1-5 — dispatcher pattern, notification-state extraction from channel_metadata, ADR for the adapter contract — are scoped to the PR that will rebase onto Sam's PR aws-samples#52 (feat(interactive-agents)) once it lands. Sam's locked plan establishes the FanOutConsumer router that those items depend on.
Addresses review item 12 / general feedback 4: the shared utilities (slack-verify, slack-blocks) and CDK constructs had solid coverage but all 7 Lambda handlers had zero unit tests, a departure from the rest of the codebase. Added test files covering the most load-bearing paths: - slack-events.test.ts (11 tests) — url_verification challenge with/ without signature (item 2), retry semantics (drop non-critical, reprocess app_uninstalled + tokens_revoked, item 3), revoke-installation ordering (item 6: no secret delete on DDB failure), app_mention forwarding, no-repo + Lambda-failure reaction swaps (item 8) - slack-commands.test.ts (4 tests) — signature rejection, inline help, async processor invocation, graceful processor failure - slack-interactions.test.ts (6 tests) — signature rejection, cancel_task happy path + owner rejection + terminal-state warn, unknown action_id, unlinked account - slack-notify.test.ts (8 tests) — non-Slack task short-circuit, dedup ordering after source check (item 17), dedup write skip on duplicate, SlackApiError swallow vs infra-error rethrow (item 4), non-INSERT / non-notifiable filter, malformed metadata logging (item 20) - slack-link.test.ts (6 tests) — 401 without JWT, 400 without code, 404 on missing/stale code, happy path write+delete, code normalization - slack-oauth-callback.test.ts (5 tests) — missing code, unpopulated client secret, token exchange failure, create-secret-on-RNF path, restore-then-update path for reinstall after uninstall - slack-command-processor.test.ts (9 tests) — legacy source-less payload normalization, slash-submit-rejection UX, mention link-prompt + repo validation + private channel guard (item 10 hard failure) + transient fail-open (item 10 softening), createTaskCore invocation, link code persistence, help text The regression tests for items 2, 3, 4, 6, 9 all fail the buggy pre-fix version of the code, verified by running them against the unchanged source during review.
Addresses blind-review feedback S2 + S3. **S2 — checkChannelAccess transient failures now fail open.** The prior "fail closed on any unknown Slack error" was too strict: a 30-second Slack ratelimited or internal_error during conversations.info would block every Slack task submission for that window, even though the notification path downstream could have recovered. Now: - Hard failures (channel_not_found, not_in_channel, missing_scope) still fail closed — these definitively mean the bot cannot post there, and silently creating an undeliverable task is the worse UX. - Transient or unknown errors (ratelimited, internal_error, fatal_error, network blips) fail open. slack-notify surfaces any real delivery failure when the event plays through. Regression test added: `mention submit fails open on transient Slack errors` asserts createTaskCore IS called when conversations.info returns `ratelimited`. **S3 — USER_GUIDE Slack bullet drops "check status".** handleStatus was deleted in this branch (review item 1) because Sam's locked plan in PR aws-samples#52 delivers `bgagent status` via the CLI rather than a Slack slash subcommand. The user-facing docs still claimed Slack could "check status" via `/bgagent`. Rewritten to describe the shipped behavior: @mention-based submission plus threaded progress notifications with reaction-based status. Starlight content regenerated via `node docs/scripts/sync-starlight.mjs`.
Trivial enabler for the Linear integration branch. No functional change: all existing code already uses the broad ChannelSource union; adding 'linear' makes the Linear-origin task records type-check once the inbound adapter starts writing them.
- shared/linear-verify.ts: HMAC-SHA256 hex verify over raw body,
60s webhookTimestamp replay window, secret caching + rotation retry.
- LinearProjectMappingTable: linear_project_id → repo + label_filter.
- LinearUserMappingTable: {workspace}#{user} confirmed / pending#{code}
with TTL and PlatformUserIndex GSI.
- linear-webhook.ts: verify + dedup (60s TTL on (issue_id, action))
+ async-invoke processor; replies fast inside Linear's 5s budget.
- linear-webhook-processor.ts: Issue label-add detection, project→repo
+ actor→platform user resolution, createTaskCore with
channelSource 'linear' + linear_* channel metadata.
- linear-link.ts: mirror of slack-link.ts.
- LinearIntegration construct: provisions both mapping tables, the webhook dedup table, 3 Lambdas (webhook / processor / link), 2 Secrets Manager placeholders, and API Gateway routes under /v1/linear/*. Deliberately no DynamoDB Streams consumer — outbound updates go through the Linear MCP from inside the agent container. - Stack: instantiates LinearIntegration in agent.ts and emits CfnOutputs for the two secret ARNs and both table names, consumed by `bgagent linear setup` / onboard-project.
- Orchestrator invoke payload now carries `channel_source` +
`channel_metadata`, so the agent knows which inbound channel a task came
from (and the Linear issue identifier for Linear-origin tasks).
- TaskConfig + build_config accept the new fields; server.py extracts them
from the /invocations payload and forwards into pipeline.run_task.
- channel_mcp.py: new `configure_channel_mcp(repo_dir, channel_source)`.
For channel_source=='linear' it merges a `linear-server` entry into
`.mcp.json` (keyed so tools surface as `mcp__linear-server__*`, matching
the verified MCP spec). For any other channel it's a no-op — the gate
that keeps Slack/API tasks from loading Linear tools.
- config.py: `resolve_linear_api_token()` reads LinearApiTokenSecret via
LINEAR_API_TOKEN_SECRET_ARN and caches into LINEAR_API_TOKEN so the SDK
child process inherits it for the MCP's ${LINEAR_API_TOKEN} placeholder.
- pipeline.py: resolves the token + writes .mcp.json just before run_agent.
- prompt_builder.py: appends a short Linear-specific addendum naming
`mcp__linear-server__save_comment` / `update_issue` by tool name.
- Stack: grants runtime read on LinearApiTokenSecret and sets the secret
ARN in the AgentCore runtime environment via CFN override.
- run.sh: forwards LINEAR_API_TOKEN / LINEAR_API_TOKEN_SECRET_ARN into the
local container for manual testing.
- cli/src/commands/linear.ts: new `bgagent linear` command group:
- `setup` — prompts for the Linear webhook signing secret and personal
API key, populates both placeholders in Secrets Manager, prints the
next-step checklist including the webhook URL.
- `link <code>` — calls POST /v1/linear/link with the Cognito JWT.
- `onboard-project <linear-project-id> --repo owner/repo [--label bgagent]`
— writes the Linear project → repo mapping row directly to
LinearProjectMappingTable via DynamoDBDocumentClient (admin IAM path,
matching the current onboarding UX for Slack + webhooks).
- api-client + types: LinearLinkResponse type + linearLink() method.
- bgagent.ts: registers the new command group.
- cli/package.json: adds @aws-sdk/client-dynamodb + @aws-sdk/lib-dynamodb
so the CLI can write project mappings.
- docs/guides/LINEAR_SETUP_GUIDE.md: new end-to-end setup walkthrough
mirroring SLACK_SETUP_GUIDE, covering API-key generation, webhook
configuration, onboarding, linking, and troubleshooting.
- docs/scripts/sync-starlight.mjs + docs/astro.config.mjs: routing + sidebar
entry for the new guide.
- docs/guides/USER_GUIDE.md: Linear now shows up as the 5th channel.
- cdk/test/stacks/agent.test.ts: DDB table count bumped 7 → 10 to cover
the three new Linear tables.
New test coverage for the Linear integration:
- cdk/test/handlers/linear-webhook.test.ts — HMAC pass/fail, stale
webhookTimestamp, non-Issue type is 200/no-op, missing data.id is 400,
dedup-hit short-circuits processor invoke, Lambda invoke failure surfaces
as 500 so Linear retries, malformed JSON with valid signature is 400.
- cdk/test/handlers/linear-webhook-processor.test.ts — project-not-onboarded
and status=removed skip, label filter gates on create AND on update
(updatedFrom.labelIds transition), unlinked Linear actor skips, happy
path asserts channel_source='linear' + full linear_* channelMetadata,
custom label_filter honored per project mapping.
- cdk/test/handlers/linear-link.test.ts — mirror of slack-link.test.ts:
401/400/404 error paths + confirmed-mapping write + code normalization.
- cdk/test/constructs/linear-integration.test.ts — resource-count
assertions (3 DDB tables, 3 Lambdas, 2 secrets, ZERO DDB Streams
mappings), env-var wiring for both handlers, dedup-table TTL shape.
- agent/tests/test_channel_mcp.py — channel gate (slack/api/webhook/empty
are no-ops; only linear writes), .mcp.json shape (server key
`linear-server`, hosted URL, ${LINEAR_API_TOKEN} placeholder), merge
semantics (preserves other servers, overwrites stale linear entry,
tolerates malformed JSON), missing-repo_dir guard.
All 853 CDK tests and 294 agent tests pass locally.
Exempt /v1/linear/webhook from AWSManagedRulesCommonRuleSet via a scopeDownStatement. Linear Issue webhook payloads routinely exceed the ruleset's default 8 KB body-size limit (observed 8614 bytes), and the full payload is needed by the webhook Lambda for HMAC verification. The route has stronger, bespoke protection already: - HMAC verification against LinearWebhookSecret on the raw body - webhookTimestamp replay guard (60s window) - Structured JSON parsing with strict field validation - RateLimitRule (1000/IP/5min) is priority 3 and still applies CRS continues to block on every other route (user API, Slack, etc.).
The prior dedup key was issueId#action, which collapsed distinct label-toggle deliveries on the same issue into a single dispatch — a label remove/reapply within 60s would be silently dropped. Also, the prior TTL (60s) was shorter than Linear's retry horizon (+1m, +1h, +6h). A dispatched-but-ack-lost delivery would re-dispatch on the +1h or +6h retry after the dedup row had expired, double- creating the task. Fix: - Compose the dedup key as issueId#action#webhookTimestamp. Linear reuses webhookTimestamp across retries of one delivery (correctly collapsing them) but assigns a fresh timestamp to each new delivery (no collapse). This is the primitive documented in linear.app's webhook spec; the payload's webhookId field is the webhook-config id, not per-delivery. - Raise DEDUP_TTL_SECONDS to 8h, comfortably over Linear's 7h retry horizon. Added a regression test that two distinct deliveries for the same issue both dispatch, and an assertion that the TTL is >= 7h.
When shouldTrigger rejects an update event, add current_labels, updated_from_keys, updated_from_label_ids, and current_label_ids to the log line so we can diagnose cases where the filter drops a valid label-added transition. No behaviour change; diagnostic-only.
Three CLI UX improvements discovered during first-time setup: 1. Auto-link the token owner during `bgagent linear setup`. After storing the webhook secret and API key, query Linear's `viewer` + `organization` via GraphQL using the token, decode the caller's Cognito sub from the cached id_token, and write the LinearUserMapping row directly (status='active', link_method='auto_setup'). This skips the manual code-exchange ceremony for the common case where the person installing ABCA for their workspace is the same person who'll trigger tasks. Failures are fail-open: if GraphQL errors, the API key is malformed, or the CLI isn't logged in, setup prints a warning and continues. Secrets are still stored; the user can run the link flow manually. 2. `bgagent linear list-projects` — lists all projects visible to the stored API token with their full UUIDs. Solves the "Linear project URL contains a truncated UUID" footgun. 3. UUID validation on `onboard-project`. RFC-4122 shape check up- front, with a clear error message pointing at `list-projects`. Avoids silent mapping-row writes for slug-shaped UUIDs that would never match a real Linear payload. Tests cover the auto-link branch, fail-open cases, and the GraphQL wire shape.
Mirror the Slack integration's terminal-emoji status UX on Linear issues: post 👀 when the agent container starts, swap it for ✅ on success or ❌ on failure (including the pipeline's outer-except crash path). Implementation notes: - Linear's MCP v1 does not expose a reactions tool (verified 2026-05-06 against the `tools` array in the Claude Agent SDK's system init). We use direct GraphQL against `https://api.linear.app/graphql` with the same token the MCP server uses. Retire this module in favour of a prompt addendum once Linear MCP ships `create_reaction`. - Gated on `channel_source == 'linear'` + `linear_issue_id` in channel_metadata. No-op for Slack/API/webhook channels. - `react_task_started` returns the reaction id; the pipeline threads it into `react_task_finished` so we can `reactionDelete` the 👀 before posting the terminal emoji. Matches Slack's behaviour of not leaving a lingering "watching" marker. - All GraphQL failures are logged and swallowed — reactions are advisory UX, never load-bearing. - Crash-path (pipeline.py outer except): calls react_task_finished(success=False, ...) as a best-effort so a crashed/failed task doesn't leave a dangling 👀. Tests cover: - Channel gate (slack/api/webhook all no-op; linear with no issue_id also no-op) - GraphQL wire shape and variables - Delete-then-create swap on terminal transition - Fallback path when started_reaction_id is None - All error swallowing paths (HTTP error, connection exception, GraphQL errors, missing token)
The Linear MCP's tool for updating an issue is
mcp__linear-server__save_issue, not update_issue. Earlier prompt
referenced the wrong name; the agent silently skipped the state
transition on small/fast tasks because the advisory framing
("use its tools to keep the reporter informed") let the model
deprioritize the calls.
Rewrite the addendum as a required three-step contract:
1. On start — save_comment 🤖 Starting
2. On PR open — save_comment with PR URL + save_issue (transition
to In Review; fall back to In Progress; skip if neither state
exists rather than loop list_issue_statuses)
3. On completion/failure — save_comment with final status
Added explicit "If neither exists, skip the state transition — the
PR comment alone is enough. Do not invent state names or loop on
list_issue_statuses." so an impoverished workspace state-set
doesn't trigger retry behaviour.
test_background_thread_failure_503_and_backup_terminal_write polled /ping for _background_pipeline_failed = True, but the flag flips several lines before write_terminal runs in the same except block. On the first run after a fresh import the poll could win the race and mock_write.assert_called() would fire before the call happened. Swap the poll-the-flag pattern for a wait-until-the-background-thread- has-exited loop, which guarantees both side effects (the flag flip and the backup write_terminal call) have completed before we assert. Observed flaky 2026-05-06 in a full-suite run; reliably green after the fix across 5 consecutive runs.
Restructure the setup guide so the webhook URL is available before the user needs to create the webhook (prior ordering was circular). - Step 1: generate Linear API key - Step 2: run `bgagent linear setup` — it prints the webhook URL and pauses at the signing-secret prompt - Step 3: create the webhook in Linear, copy the signing secret - Step 4: back in the terminal, paste the signing secret + API key; auto-link runs for the token owner - Step 5: `bgagent linear onboard-project <uuid>` — using the `list-projects` command to find the full UUID (the URL slug is truncated) - Step 6: manual code-exchange flow, now marked as fallback for linking additional Linear users beyond the token owner Synced to docs/src/content/docs/using/Linear-setup-guide.md via mise //docs:sync.
Merges Sam's PR aws-samples#52 (FanOutConsumer dispatcher) and adjacent upstream changes on top of the Linear integration work. Key reconciliations: - `ChannelSource` union keeps all four members ('api', 'webhook', 'slack', 'linear') with upstream's better docstring. Fixed an upstream test that expected 'slack' to be an invalid value. - `TaskRecord` / `TaskConfig` / pipeline signatures now include both our `channel_source` + `channel_metadata` and upstream's `trace` + `user_id` fields (all additive). - Agent pipeline runs both new hooks in order: Linear reactions (swap 👀 for ✅/❌) then upstream's `_maybe_upload_trace`, so the reaction fires before any trace S3 upload that could be slow. - Crash path invokes both Linear reaction (best-effort ❌) and `_maybe_upload_trace` (debug artifact) so a crashed task still produces a meaningful Linear signal and a usable trace. - `server.py` adopts upstream's refactored _validate_required_params / _spawn_background structure; our `channel_source` + `channel_metadata` are threaded through the params dict. - WAF scope-down (Linear webhook path exemption from CRS body-size rule) and our 8h dedup TTL are preserved unchanged. - Agent test count went from 306 to 526; CDK suite count went from 853 to 1121. All green. FanOutConsumer (new from aws-samples#52) and LinearIntegration sit side-by-side in `stacks/agent.ts` — FanOutConsumer is independent of Linear's wiring. Linear continues to route outbound updates via agent-side MCP rather than subscribing to the fanout dispatcher (by design; keeps per-task Linear comment/state updates in the container where the task actually runs). See things-to-do.md for the "evaluate linear-notify under FanOutConsumer" follow-up. Keeping it parked: MCP-based outbound is already shipped and validated; moving to a fanout subscriber would be a strategy change, not a bug fix.
Roadmap doc has no Linear content; referring to it misled readers. Replace the link with a plain explanation of the admin-assisted path.
Describe the v1 linking model accurately: ABCA ships the consumer command (`bgagent linear link <code>`) but no Linear-side code generator — the comment-triggered flow is a planned v1.1 follow-up. Design around API-token-owner-as-sole-submitter until then.
Fix two `no-duplicate-imports` violations in the Slack tests by merging `import type` + `import` pairs into single imports. ESLint/ruff also applied cosmetic reformatting (whitespace alignment in object literals, line-splitting long inline objects). One `# noqa: S105` added to `LINEAR_API_TOKEN_ENV` — it's an env var *name*, not a secret value, so ruff's hardcoded-password rule is a false positive there. No behavioural changes. Ensures `mise run build` passes the self- mutation guard in .github/workflows/build.yml.
|
Feedback from automated review: Required before merge #: 1 Recommended follow-ups (v1.1)
|
|
Just adding as a follow up: will open an issue after to use AgentCore gateway The current approach isn't wrong — it's a conscious shortcut for v1. But as the project adds more external MCP integrations, the .mcp.json injection pattern doesn't scale. The Gateway is the project's stated integration point for external tools, and Linear should converge on it. |
1) Linear webhook: roll back dedup row on Lambda invoke failure Previously, the dedup row was written before invoking the processor Lambda. If InvokeCommand failed (throttle, 5xx, etc.), the handler returned 500 and relied on Linear's retry schedule (+1m / +1h / +6h) to redeliver — but all three retries fell inside the 8h dedup TTL, so each one hit ConditionalCheckFailedException and silently acked as "already processed." Net: a single transient invoke failure dropped the task for 8 hours. Fix: on invoke failure, delete the dedup row before returning 500 so the next retry can reserve it again. A secondary try/catch logs (but does not re-raise) if the delete itself fails — the primary invoke error is the user-visible signal. Added two regression tests. 2) CLI ChannelSource sync with CDK cli/src/types.ts:30 still had 'api' | 'webhook'; CDK had 'api' | 'webhook' | 'slack' | 'linear' since the Slack/Linear inbound adapters landed. Any CLI switch on TaskDetail.channel_source was narrowing against a stale union. Widened the CLI type and refreshed the jsdoc to match the CDK side. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks for the thorough review — both blockers fixed in 7d652a4. #1 — Dedup row on invoke failure. Agreed this is a silent-drop trap; the 8h TTL fully swallows Linear's +1m/+1h/+6h retry horizon. Fix was the "delete on invoke-error" shape (not "write after invoke") — keeping the write before invoke preserves the dedup guarantee for the common case where a retry races an in-flight ack. Added two regression tests: one asserts the #2 — CLI v1.1 follow-ups — all five parked in our internal backlog and slotted for the v1.1 pass:
AgentCore Gateway note — acknowledged. Convergence on Gateway as the MCP integration point makes sense as the project adds more external MCPs; happy to pick that up as a follow-up once the issue lands. Full test counts post-fix: CDK 1170 (+2 rollback tests), agent 526, CLI 196 — all green. |
nits Wraps the v1.1 polish theme from PR aws-samples#87. Five small additions, all agent-side or docs: State-on-start (the user-visible one): - prompt_builder._channel_prompt_addendum now instructs the agent to transition the originating Linear issue to `In Progress` (or `Todo` fallback) at agent-start, mirroring the existing `In Review` chain fired at PR-open. Closes the gap where the issue stayed at `Backlog` during real agent work — only the 👀 reaction and "🤖 Starting" comment signaled progress, while humans-using-Linear expect the state column to reflect "being worked." Skips if the issue is already in `In Progress` or any later state; doesn't loop on list_issue_statuses. Alain aws-samples#63 review nits (4 small surgical changes): - linear_reactions.py: auth-failure circuit breaker. Track consecutive 401/403s; after 3 strikes, log ERROR once and short-circuit all later _graphql calls (return None) until the container restarts. Resets on any 2xx response. Replaces the prior behaviour where revoked tokens flooded CloudWatch with WARNs and wasted Linear API quota indefinitely. - pipeline.py: declare `linear_eyes_reaction_id: str | None = None` explicitly before the try block instead of relying on `locals().get("linear_eyes_reaction_id")` in the crash handler. Functionally identical; survives refactors and reads cleanly. - config.py::resolve_linear_api_token: narrow `except Exception` to `(BotoCoreError, ClientError)` from botocore.exceptions. Switch `print()` to `shell.log("WARN", ...)` so warnings join the structured log stream the rest of the agent uses. - LINEAR_SETUP_GUIDE.md + cli/src/commands/linear.ts: stop telling users to run `bgagent linear link <code>` when auto-link fails — the code generator is a v3 feature that doesn't ship in v1, so the suggestion was misleading. Replaced with explicit admin-assisted fallback (DynamoDB put-item with steps to find workspaceId, viewerId, Cognito sub) and a clear "this command exists but is non-functional in v1" note. Tests: 532 agent + 1268 cdk + 196 cli, all green. Deployed to backgroundagent-dev. Smoke-tested 👀-on-start (156ms, agent unblocked) in the prior commit; state-on-start smoke is the next manual step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- linear_reactions: log a single DEBUG line when the auth circuit breaker short-circuits a call, so the path isn't zero-trace once open. - config: split the `(BotoCoreError, ClientError)` catch so `AccessDeniedException` logs at ERROR instead of WARN — IAM misconfig is persistent and should page someone, not blend into transient warnings. Also drop the personal name from the inline reference to the aws-samples#63 review. - linear-webhook-processor: tighten `buildCreateTaskFailureMessage` param types to `number` / `string` (no `| undefined`) — the only caller passes `APIGatewayProxyResult` fields which are always defined. Removes dead fallback-to-`'unknown'` branches. - test_config: 2 new tests covering the split exception path (AccessDenied → ERROR; ResourceNotFound → WARN). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…weep, nits (#87) * feat(linear): comment + react on pre-container task-creation failures Closes the silent-drop UX gap that appeared whenever a Linear-triggered task was rejected before the agent container started — the user would apply the trigger label, see nothing happen, and have no way to know why. Reactions and progress comments are emitted by the agent container; nothing fired until that point, so all upstream rejections were invisible on the Linear side. This commit wires a best-effort GraphQL feedback path covering all six distinct rejection points: In `linear-webhook-processor.ts` (pre-`createTaskCore`): 1. Issue has no projectId → "isn't in a project" comment 2. Project not onboarded / removed → "isn't onboarded; admin can run `bgagent linear onboard-project`" comment 3. Webhook missing organization or actor → diagnostic comment 4. Linear actor has no linked platform user → "v1 only the API-token owner can submit; multi-user OAuth is on the v3 roadmap" comment 5. `createTaskCore` returns non-201 → message branched on status: guardrail/validation block surfaces the user-facing error string; 503 prompts the user to re-apply the label; other 4xx/5xx falls through to a generic message. In `orchestrate-task.ts` (post-201, in admission control): 6. User concurrency cap rejection → "concurrency limit; wait for one to finish, then re-apply the label" comment. All five processor paths and the orchestrator path call a shared helper, `reportIssueFailure(secretArn, issueId, message)`, that runs the comment and ❌ reaction in parallel via `Promise.allSettled`. The helper: - Reuses the existing 5-minute `getLinearSecret` cache from `linear-verify.ts` (no extra Secrets Manager hits on warm Lambdas). - Swallows network, auth, and GraphQL errors with WARN logs — Linear feedback is advisory and must never gate the rejection path. - Posts to Linear's hosted GraphQL endpoint; mutation shapes match `agent/src/linear_reactions.py` (`commentCreate`, `reactionCreate`). CDK plumbing: - `linear-integration.ts` — wires `LINEAR_API_TOKEN_SECRET_ARN` into the webhook processor and grants read on the existing `LinearIntegration.apiTokenSecret`. - `agent.ts` — grants the same secret to `orchestrator.fn` and populates the env var. The grant is unconditional; the orchestrator only invokes the helper when `task.channel_source === 'linear'`. The non-Linear case is a hard no-op at the call site — `notifyLinear- OnConcurrencyCap` early-returns on `channel_source !== 'linear'`, and the processor only handles Linear payloads. Slack/API/webhook tasks are unaffected. Tests (28 new; 1240 → 1268, all green): - `cdk/test/handlers/shared/linear-feedback.test.ts` (13 tests): mutation shape, auth header, error swallowing in 4 distinct failure modes (secret-resolution null, non-2xx, GraphQL `errors`, network throw), `Promise.allSettled` partial-success semantics. - `cdk/test/handlers/linear-webhook-processor.test.ts` (10 new tests in a `user-visible feedback` describe block): one assertion per rejection path + happy-path-doesn't-fire + filter-rejection-doesn't- fire (the latter is intentional UX — the processor sees many events that aren't tasks, and dropping a comment on each would be noisy). - `cdk/test/handlers/orchestrate-task-feedback.test.ts` (5 tests): new file; covers `notifyLinearOnConcurrencyCap` directly with `withDurableExecution` mocked. Asserts the linear path fires; the api/webhook/slack paths no-op; missing metadata, missing env, and undefined `channel_metadata` all no-op cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(linear): finish v1.1 polish — state-on-start + Alain #63 nits Wraps the v1.1 polish theme from PR #87. Five small additions, all agent-side or docs: State-on-start (the user-visible one): - prompt_builder._channel_prompt_addendum now instructs the agent to transition the originating Linear issue to `In Progress` (or `Todo` fallback) at agent-start, mirroring the existing `In Review` chain fired at PR-open. Closes the gap where the issue stayed at `Backlog` during real agent work — only the 👀 reaction and "🤖 Starting" comment signaled progress, while humans-using-Linear expect the state column to reflect "being worked." Skips if the issue is already in `In Progress` or any later state; doesn't loop on list_issue_statuses. Alain #63 review nits (4 small surgical changes): - linear_reactions.py: auth-failure circuit breaker. Track consecutive 401/403s; after 3 strikes, log ERROR once and short-circuit all later _graphql calls (return None) until the container restarts. Resets on any 2xx response. Replaces the prior behaviour where revoked tokens flooded CloudWatch with WARNs and wasted Linear API quota indefinitely. - pipeline.py: declare `linear_eyes_reaction_id: str | None = None` explicitly before the try block instead of relying on `locals().get("linear_eyes_reaction_id")` in the crash handler. Functionally identical; survives refactors and reads cleanly. - config.py::resolve_linear_api_token: narrow `except Exception` to `(BotoCoreError, ClientError)` from botocore.exceptions. Switch `print()` to `shell.log("WARN", ...)` so warnings join the structured log stream the rest of the agent uses. - LINEAR_SETUP_GUIDE.md + cli/src/commands/linear.ts: stop telling users to run `bgagent linear link <code>` when auto-link fails — the code generator is a v3 feature that doesn't ship in v1, so the suggestion was misleading. Replaced with explicit admin-assisted fallback (DynamoDB put-item with steps to find workspaceId, viewerId, Cognito sub) and a clear "this command exists but is non-functional in v1" note. Tests: 532 agent + 1268 cdk + 196 cli, all green. Deployed to backgroundagent-dev. Smoke-tested 👀-on-start (156ms, agent unblocked) in the prior commit; state-on-start smoke is the next manual step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(format): apply ruff format Whitespace-only changes flagged by CI's self-mutation guard. No behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(linear): address PR #87 must-fix review items - linear_reactions: guard auth-circuit globals with `_auth_state_lock` so the daemon sweep thread and the main thread can't race the read-modify-write on `_consecutive_auth_failures` / `_auth_circuit_open`. - linear_reactions: wrap the daemon sweep target in `_sweep_stale_reactions_safe` so an unexpected exception logs at ERROR instead of dying silently (stderr from a daemon thread doesn't reliably reach CloudWatch). - linear_reactions: only increment the sweep delete counter when `_graphql(_DELETE_MUTATION, ...)` actually returns a non-None response — previously the summary log overstated success. - config: hoist `import boto3` out of the catch-narrowed try/except so an `ImportError` (boto3 missing from the image) degrades to a WARN log instead of crashing the agent. - orchestrate-task: wrap `notifyLinearOnConcurrencyCap` in a defensive try/catch — durable-execution retries the entire admission-control step on throw, which would re-fire `failTask` + `emitTaskEvent` and produce duplicate events. - tests: 1 new throw-propagation test for `notifyLinearOnConcurrencyCap`, 3 new tests for `resolve_linear_api_token` (cached env, no-arn, ImportError fallback). Auto-reset fixture in `test_linear_reactions.py` now also resets the circuit-breaker globals between tests so future cases don't leak state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(linear): address PR #87 nice-to-have review items - linear_reactions: log a single DEBUG line when the auth circuit breaker short-circuits a call, so the path isn't zero-trace once open. - config: split the `(BotoCoreError, ClientError)` catch so `AccessDeniedException` logs at ERROR instead of WARN — IAM misconfig is persistent and should page someone, not blend into transient warnings. Also drop the personal name from the inline reference to the #63 review. - linear-webhook-processor: tighten `buildCreateTaskFailureMessage` param types to `number` / `string` (no `| undefined`) — the only caller passes `APIGatewayProxyResult` fields which are always defined. Removes dead fallback-to-`'unknown'` branches. - test_config: 2 new tests covering the split exception path (AccessDenied → ERROR; ResourceNotFound → WARN). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(linear): address PR #87 re-review must-fix items - linear-webhook-processor: extracted `safeReportIssueFailure` helper and routed all 5 bare `await reportIssueFailure(...)` call sites through it. The helper is uniformly non-throwing — wraps the call in try/catch to defend against a future signature refactor that could break the helper's `Promise.allSettled` never-throw contract. A synchronous throw would otherwise propagate, fail the Lambda, and trigger SQS retries on a poison message. - linear-webhook-processor: dropped the `!` non-null assertion on `process.env.LINEAR_API_TOKEN_SECRET_ARN` at module scope. The helper now guards on `!API_TOKEN_SECRET_ARN` and logs a single clear `Skipping Linear feedback: LINEAR_API_TOKEN_SECRET_ARN not set` diagnostic per skip — matches the orchestrator pattern in `notifyLinearOnConcurrencyCap`. - test_linear_reactions: new `TestAuthCircuitBreaker` class with 5 tests covering the previously-untested circuit: * 3 consecutive 401/403s open the circuit * Once open, calls short-circuit without hitting the network * A 2xx between failures resets the counter * Non-auth status (500) doesn't increment the counter * 401 and 403 are both treated as auth failures - test_linear-webhook-processor: 2 new tests assert `safeReportIssueFailure` swallows both synchronous throws and async rejections from the underlying helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(linear): address PR #87 re-review nice-to-have items - test_config: cover the BotoCoreError branch of `resolve_linear_api_token` with an `EndpointConnectionError` case. The PR-#87 split into ClientError + BotoCoreError branches previously had no test on the BotoCoreError path. - test_linear_reactions: new `test_sweep_preserves_just_posted_eyes_via_exclude_id` exercises the `exclude_id` filter — the existing sweep test never collided prior reaction ids with the newly posted one, so the branch was effectively dead code in tests. The new test plants the just- posted 👀 in the prior reactions list and asserts it survives the sweep while an older ❌ is deleted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(linear): annotate circuit breaker globals so ty doesn't narrow `ty check` infers `_consecutive_auth_failures = 0` as `Literal[0]` and `_auth_circuit_open = False` as `Literal[False]`, which then rejects the legitimate runtime flips (and the test fixture that resets them between cases). Adding explicit `int` / `bool` annotations widens the inferred type and fixes the CI typecheck failure introduced in `f4633be`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: bgagent <bgagent@noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Alain Krok <alkrok@amazon.com>
Replace the personal-API-key flow with the Linear OAuth `actor=app` install path verified by the 2.0b spike. Major changes: - Step 1: AgentCore credential provider via `bgagent linear oauth-register-workspace`, capturing the AWS-hosted callback URL that Linear will actually see. - Step 2: Linear OAuth app creation via `bgagent linear app-template`, documenting the GitHub-username-with-[bot]-suffix and Webhooks-ON gates that produce Linear's misleading "Invalid redirect_uri" error when missing. - Step 4: OAuth dance via the rewritten `bgagent linear setup` — ephemeral localhost HTTPS callback; no own ALB/Lambda needed since AWS proxies the OAuth flow. - Step 7: clarify that the PAK-owner auto-link becomes the setup-runner auto-link; the manual DDB mapping path stays for now until self-service `@bgagent link` ships. - New "Adding additional Linear workspaces" section for multi-workspace deployments. - New "Migration from 2.0a (PAK) to 2.0b (OAuth)" runbook. - Troubleshooting expanded to cover the Invalid-redirect_uri and 401-from-Linear scenarios surfaced in the spike. Notes the docs reference commands shipping in Wave 2 (aws-samples#63 oauth-register-workspace, aws-samples#65 setup wizard, aws-samples#67 add-workspace) — the 2.0b branch is a coherent unit and aws-samples#62 must land before those flows are wired so the docs aren't a moving target during implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
channel_source='linear'. Agent updates the originating Linear issue inline (comments, state transition, reactions) via Linear's hosted MCP server + a direct GraphQL client for reactions. Live demo: https://isadeks.github.io/abca-linear-demo/ (7 agent-authored PRs merged, site deployed per merge).checkChannelAccess. Originally on a separate local branch; folded into this PR per your preference for a single combined review surface.aws-samples/mainto absorb Sam's PR feat(interactive-agents): async-only background task UX + Cedar HITL design #52 (FanOutConsumer) and adjacent changes. Resolved 13 conflicts across schema additions, construct wiring, agent pipeline ordering, and a flaky-test fix. Full test suite green post-merge (CDK 1121, agent 526, CLI 76).Response to your #42 review concerns
Being honest about what Linear addresses vs. sidesteps:
SlackNotifyFn+FanOutConsumer= exactly 2/2 at the AWS limit. Structural fix still needs to happen, not in this PR — migratingSlackNotifyFn(from #42) intoFanOutConsumeras a dispatcher subscriber would drop the count back to 1 and restore headroom. We deliberately held the line and didn't add a third.channelSource === 'linear'appears only inlinear-webhook-processor.ts(one call site, one docstring). If a fourth channel ships as a stream consumer in the future, the duplication concern returns — but the #52 dispatcher pattern is the intended long-term shape for that; future channels should subscribe to it rather than add another filter-in-each-consumer.linear-webhook.tsonly, with key${issueId}#${action}#${webhookTimestamp}(8h TTL, exceeds Linear's 7h retry horizon). No TaskRecord-level dedup extensions.channel_metadataaccumulates lifecycle state (e.g.slack_session_msg_ts)linear_*_msg_tsfields onTaskRecord. All per-issue state (comments, state id, reaction id) lives in Linear itself, accessed at runtime via MCP + GraphQL. The container is the only place that touches Linear state; nothing is stashed on the record.Architectural notes
Why MCP-based outbound instead of a FanOutConsumer subscriber. Linear ships a first-class MCP server (
https://mcp.linear.app/mcp), so the agent can update the issue from inside the container using tools — same place the work is happening, no extra Lambda hop, no duplicated state. The fanout pattern is a better fit for ephemeral notifications (Slack thread updates, email) than stateful back-and-forth with a ticket (comment threads, state transitions, reactions that need the originating issue id).Operationally:
linear-integration.tshas zero stream consumers and zero outbound Lambdas. The only Lambdas are the webhook receiver, the async processor, andlinear-link(CLI account linking). Everything user-visible happens in the agent container.WAF scope-down on
/v1/linear/webhook. Linear Issue webhook payloads routinely exceed the 8 KB body-size rule inAWSManagedRulesCommonRuleSet(observed 8614 bytes in smoke). Scoped the CRS out for this one path viascopeDownStatement. The route retains HMAC verification,webhookTimestampreplay guard (60s), JSON schema validation, webhook dedup, and the priority-3 rate-limit rule.AWSManagedRulesKnownBadInputsRuleSetremains unchanged.Auto-link on
bgagent linear setup. Linear identifies the API-key owner viaviewer { id }, so we look up the caller's Cognitosubfrom the cached id_token and write theLinearUserMappingrow directly. Fail-open on auth / network errors (secrets still get stored; user is told to re-runbgagent login). This closes the "now go pastebgagent linkin a comment" dance for the common single-admin case.v1 multi-user scope. Only the API-token owner can submit tasks from Linear in v1. Tasks triggered by other Linear users are dropped by the processor with
"Linear actor has no linked platform user — skipping task creation".bgagent linear link <code>is registered as a CLI command but the Linear-side code generator isn't in v1 — OAuth bot install is the proper multi-user UX and will subsume any link-code bridge, so shipping the bridge didn't seem worth it. Constraint is documented in the setup guide.In scope / out of scope
In this PR (v1 / shipped):
LinearProjectMappingTable,LinearUserMappingTable) + dedup tableLinearIntegrationCDK construct (zero stream consumers, zero outbound Lambdas)configure_channel_mcp, gated onchannel_source='linear')create_reaction)bgagent linear setup(+ auto-link),bgagent linear link,bgagent linear list-projects,bgagent linear onboard-project(+ UUID validation)save_comment,save_issuewith graceful "no matching state" fallback)test_serverrace fixOut of scope (planned follow-ups):
/bgagent retryin a Linear comment) for partial-PR recovery.Commentwebhook subscription + slash-command parser in processor.bgagent cancelSIGKILLs the container, so Linear-side ❌ never fires from the agent'sexcept.Validation
webhookTimestampkeys → no delivery collapseonboard-projectrejects the Linear-URL-slug formbgagent cancelmid-run (documents the v3 gap)aws dynamodb describe-table --query 'Table.StreamSpecification'onTaskEventsTable— 2 consumers (SlackNotifyFn+FanOutConsumer); Linear adds 0.grep -rn "channelSource === 'linear'" cdk/src/handlers/— only inlinear-webhook-processor.ts.TaskRecordhas nolinear_*_msg_tsfields.Test plan for reviewer
mise //cdk:deploy— CFN diff should add onlyLinearIntegration/*resources on top of currentaws-samples/main. Slack beyond Lambda bundle refreshes should not change.bgagent linear setup. It prints the webhook URL, pauses at the signing-secret prompt. Create a webhook in Linear at that URL (subscribe toIssue), copy the signing secret, return to the terminal and paste secret + API key. Auto-link fires for the token owner; success message:✓ Linked Linear user … → platform user ….bgagent linear list-projects→ copy the target project UUID.bgagent linear onboard-project <uuid> --repo <your-fork>/<demo-repo> --label abca.abcalabel applied at creation time.🤖 Starting…comment, PR opens, state →In Review, 👀 swaps for ✅ when the PR URL comment is posted and the task finishes.Closes #42.