Repair deterministic workflow gate failures with agents#826
Conversation
…l + cloud adapters
Mirrors the github-primitive's adapter shape so the same workflow file
runs locally (gh CLI / Slack token) and in cloud (Nango). Two
flagship verbs:
- postMessage — fire-and-forget human notification (PR opened,
workflow done, etc.). Resolves @-mentions and #channel names at
step time.
- askQuestion — block the workflow on a human reply. Configurable
timeout, replyMatch (regex/choice/any), allowedReplyFrom, and
Block Kit interactive forms. Resumable across sandbox restarts
via run-record metadata so retries don't re-ask.
Captures the cultural change the primitive enables: agents should ask
for clarification when blocked rather than guess. Includes two
recipes ("Announce + Done", "Ask Before You Guess") that the
writing-agent-relay-workflows skill will pick up when v1 ships.
Cloud-runtime auth reuses the workspace's existing Nango Slack
connection — no new SST resource bindings. Slack tokens don't rotate
per-call, so the proxy form (nango.proxy({ endpoint: '/chat.postMessage' }))
is the right shape — avoids the get-token-vs-proxy confusion the
github-primitive had to work through.
Phasing:
A — postMessage + resolvers
B — askQuestion (blocking, resumable)
C — Block Kit interactive forms + utility verbs
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Exposes the bundled `@agent-relay/github-primitive` from the root
`@agent-relay/sdk` so workflow authors no longer need the subpath
import. Two new shapes added alongside the existing `/github` subpath:
- `import { github } from '@agent-relay/sdk'` — full namespaced surface,
no collision risk with other root exports.
- `import { createGitHubStep, GitHubClient } from '@agent-relay/sdk'` —
curated helpers for the common workflow-author path.
Avoided a flat `export *` because the primitive ships ~40 generic-named
action helpers (createFile, readFile, getUser, errorMessage, ...) that
would pollute the root namespace.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR implements deterministic repair-agent support for deterministic workflow steps (types/schema/builder, runner resolution/prompt/execution), adds SDK GitHub re-exports, compacts and finalizes many trajectory artifacts and updates the index, and introduces a Slack primitive design spec with tests and README updates. ChangesDeterministic Gate Repair Feature
GitHub Primitive SDK Exports
Workflow Trajectory Artifacts
Slack Primitive Specification
🎯 3 (Moderate) | ⏱️ ~25 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (2)
.trajectories/completed/2026-05/traj_o9cx33xn5u39.json (1)
367-367: The review chapter took 18 days to complete.The review chapter's duration from
startedAt(2026-04-20T15:19:38.968Z) toendedAt(2026-05-08T13:33:35.341Z) spans approximately 18 days, accounting for nearly all of the workflow's total duration. While this could be expected for PR review processes involving human approval, it may indicate the workflow was blocked or paused for an extended period.Consider tracking whether long-running review steps indicate:
- Human review bottlenecks that could benefit from notifications/escalation
- Workflow pause/resume functionality that should be explicitly recorded
- Separate "blocked" vs "active" states for better observability
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.trajectories/completed/2026-05/traj_o9cx33xn5u39.json at line 367, The trajectory JSON currently only records startedAt and endedAt for the review chapter, which obscures long pauses; update the review chapter object (the entry with startedAt/endedAt) to include explicit lifecycle fields such as state (active|blocked|paused|completed), blockedAt and resumedAt timestamps, and an optional pausedReason or reviewerEscalation flag, and emit/record a derived durationSpentActive vs durationBlocked so observers can distinguish human-review wait time from active processing and enable notifications/escalation when blocked durations exceed thresholds.packages/sdk/src/index.ts (1)
26-26: ⚡ Quick winRe-export curated GitHub helpers via
./github.jsfor a single API source.Line 26 currently bypasses the local GitHub surface, which can create drift later if curation changes.
Proposed change
export * as github from './github.js'; -export { createGitHubStep, GitHubClient } from '@agent-relay/github-primitive'; +export { createGitHubStep, GitHubClient } from './github.js';🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/sdk/src/index.ts` at line 26, The current export line re-exports createGitHubStep and GitHubClient directly from '@agent-relay/github-primitive', bypassing the local curated surface; change the export to re-export these symbols from the local module (./github) instead so the SDK has a single source of truth—update the export statement for createGitHubStep and GitHubClient to import/export from './github' (ensuring ./github exports those same symbols).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.trajectories/compacted/compact_j5u7qhaw4q6a_2026-05-08.md:
- Line 29: The workflow text on line containing "**Apply autofix-swarm fixes in
the parent worktree at /Users/khaliqgant/Projects/AgentWorkforce/relay, not the
.relay/workspace cwd copy**" embeds a machine-specific absolute path and user
name; update that sentence to use a repo-root placeholder (e.g. "<repo-root>" or
"{REPO_ROOT}") and keep the rest of the message unchanged so it reads something
like "Apply autofix-swarm fixes in the parent worktree at <repo-root>/relay, not
the .relay/workspace cwd copy" to avoid leaking user-specific paths.
- Line 12: Replace the malformed token-format text "at*live*\*" in the decision
table with an inline code-formatted token pattern (e.g. `at*live*`) so the
prefix/wildcard renders unambiguously; search for the exact string "at*live*\*"
in the compacted decision table entry and update it to use backticks around the
intended token pattern.
In @.trajectories/completed/2026-05/traj_o9cx33xn5u39.json:
- Around line 405-409: Update the terse retrospective.summary value ("merged")
to a more descriptive sentence that captures what was accomplished; locate the
"retrospective" JSON object and change the "summary" field to something like
"Successfully added --register flag to agent-relay-broker mcp-args subcommand,
enabling token self-registration and unblocking cloud cleanup PR" or an
equivalent that names the change, the feature (--register flag /
agent-relay-broker mcp-args), and the outcome. Ensure you only modify the
"summary" string in the retrospective object so other fields (approach,
confidence) remain unchanged.
In `@packages/sdk/src/workflows/runner.ts`:
- Around line 3972-3979: The repair step currently copies context.step.cwd and
context.step.workdir which can be unset and cause repairs to run from the
agent's own directory; update the construction of the repairStep object (cwd and
workdir fields) to use the resolved context.cwd and context.workdir (already
derived from the failing deterministic execution) and fall back to
context.step.cwd/context.step.workdir only if those resolved values are
undefined, so repairStep runs in the same resolved working directory as the
failed deterministic gate.
In `@specs/slack-primitive.md`:
- Around line 237-258: The markdown fenced code block that lists the package
tree (the block starting with the line "packages/slack-primitive/") is missing a
language identifier; update its opening triple-backtick to include a language
(e.g., ```text) so the linter (MD040) passes and CI stops flagging the unlabeled
fence. Locate the block that begins with "packages/slack-primitive/" in
specs/slack-primitive.md and change the opening ``` to ```text, leaving the rest
of the block content and closing backticks unchanged.
---
Nitpick comments:
In @.trajectories/completed/2026-05/traj_o9cx33xn5u39.json:
- Line 367: The trajectory JSON currently only records startedAt and endedAt for
the review chapter, which obscures long pauses; update the review chapter object
(the entry with startedAt/endedAt) to include explicit lifecycle fields such as
state (active|blocked|paused|completed), blockedAt and resumedAt timestamps, and
an optional pausedReason or reviewerEscalation flag, and emit/record a derived
durationSpentActive vs durationBlocked so observers can distinguish human-review
wait time from active processing and enable notifications/escalation when
blocked durations exceed thresholds.
In `@packages/sdk/src/index.ts`:
- Line 26: The current export line re-exports createGitHubStep and GitHubClient
directly from '@agent-relay/github-primitive', bypassing the local curated
surface; change the export to re-export these symbols from the local module
(./github) instead so the SDK has a single source of truth—update the export
statement for createGitHubStep and GitHubClient to import/export from './github'
(ensuring ./github exports those same symbols).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 4f0114db-dde3-4e11-a1e8-01c84ae307a0
📒 Files selected for processing (24)
.trajectories/compacted/compact_j5u7qhaw4q6a_2026-05-08.json.trajectories/compacted/compact_j5u7qhaw4q6a_2026-05-08.md.trajectories/completed/2026-05/traj_1775914133873_35667beb.json.trajectories/completed/2026-05/traj_1775914133873_35667beb.md.trajectories/completed/2026-05/traj_1776073106646_1839be2d.json.trajectories/completed/2026-05/traj_1776073106646_1839be2d.md.trajectories/completed/2026-05/traj_1776113772922_bc92f121.json.trajectories/completed/2026-05/traj_1776113772922_bc92f121.md.trajectories/completed/2026-05/traj_3b3p1z4y7qlo.json.trajectories/completed/2026-05/traj_3b3p1z4y7qlo.md.trajectories/completed/2026-05/traj_itgr2w8qs3xn.json.trajectories/completed/2026-05/traj_itgr2w8qs3xn.md.trajectories/completed/2026-05/traj_o9cx33xn5u39.json.trajectories/completed/2026-05/traj_o9cx33xn5u39.md.trajectories/index.jsonpackages/sdk/src/__tests__/workflow-runner.test.tspackages/sdk/src/github.tspackages/sdk/src/index.tspackages/sdk/src/workflows/README.mdpackages/sdk/src/workflows/builder.tspackages/sdk/src/workflows/runner.tspackages/sdk/src/workflows/schema.jsonpackages/sdk/src/workflows/types.tsspecs/slack-primitive.md
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.trajectories/compacted/compact_j5u7qhaw4q6a_2026-05-08.md:
- Line 5: Correct the repeated hostname typo by replacing every occurrence of
the misspelled string "orgin.agentrelay.net" with the correct
"origin.agentrelay.net" in the document; search for the exact token
"orgin.agentrelay.net" (appears four times per the review) and update each
instance so references like the production SST alias read
"origin.agentrelay.net".
In `@specs/slack-primitive.md`:
- Line 52: Update the Slack token-prefix examples in specs/slack-primitive.md to
use clear, non-ambiguous token prefixes (e.g., "xoxb-" for bot tokens and
"xoxp-" for user tokens) instead of the confusing "xoxb-_" / "xoxp-_"; locate
the sentence that currently reads with "xoxb-_" and "xoxp-_" and replace those
token examples with "xoxb-" and "xoxp-" respectively and ensure the paragraph
still explains that the primitive expects a bot user OAuth token (xoxb-) and
validates scopes (chat:write, channels:history, groups:history) on first call,
throwing a typed error if missing.
- Around line 109-110: The spec text for the timeoutSeconds property is
contradictory: it currently says both "30 min default" and "required to set
explicitly"; choose one policy and make the spec and examples consistent: either
(A) make timeoutSeconds optional with a default of 1800 — remove the "required"
language, document "default: 1800 (30 min)" and ensure any schema/validation
(e.g., the property in the primitive definition) marks it optional, or (B) make
timeoutSeconds required — remove "default" wording, state "required" and update
examples to always supply a value and update validation to reject missing
timeoutSeconds; refer to the timeoutSeconds symbol in the spec and update the
comment and any schema/validation rules accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: e85259c5-bd83-4f73-b508-a6573c41a27f
📒 Files selected for processing (16)
.trajectories/compacted/compact_j5u7qhaw4q6a_2026-05-08.json.trajectories/compacted/compact_j5u7qhaw4q6a_2026-05-08.md.trajectories/completed/2026-05/traj_2tqxnib25omk.json.trajectories/completed/2026-05/traj_2tqxnib25omk.md.trajectories/completed/2026-05/traj_m7mpv7j8n78h.json.trajectories/completed/2026-05/traj_m7mpv7j8n78h.md.trajectories/completed/2026-05/traj_o9cx33xn5u39.json.trajectories/index.jsonpackages/sdk/src/__tests__/workflow-runner.test.tspackages/sdk/src/index.tspackages/sdk/src/workflows/README.mdpackages/sdk/src/workflows/__tests__/workflow-reliability-contract.test.tspackages/sdk/src/workflows/runner.tspackages/sdk/src/workflows/schema.jsonpackages/sdk/src/workflows/types.tsspecs/slack-primitive.md
✅ Files skipped from review due to trivial changes (6)
- .trajectories/completed/2026-05/traj_2tqxnib25omk.json
- .trajectories/completed/2026-05/traj_m7mpv7j8n78h.json
- .trajectories/completed/2026-05/traj_o9cx33xn5u39.json
- packages/sdk/src/workflows/README.md
- .trajectories/index.json
- .trajectories/compacted/compact_j5u7qhaw4q6a_2026-05-08.json
🚧 Files skipped from review as they are similar to previous changes (4)
- packages/sdk/src/index.ts
- packages/sdk/src/workflows/types.ts
- packages/sdk/src/workflows/schema.json
- packages/sdk/src/workflows/runner.ts
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/sdk/src/workflows/__tests__/workflow-reliability-contract.test.ts (1)
32-32: ⚡ Quick winReturn cloned step rows from the DB stub to avoid false-positive tests.
getStepsByRunIdcurrently returns live objects fromsteps. That can let code mutate persisted state without callingupdateStep, which weakens this contract suite.Suggested fix
- getStepsByRunId: vi.fn(async (runId: string) => [...steps.values()].filter((step) => step.runId === runId)), + getStepsByRunId: vi.fn(async (runId: string) => + [...steps.values()] + .filter((step) => step.runId === runId) + .map((step) => ({ ...step })) + ),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/sdk/src/workflows/__tests__/workflow-reliability-contract.test.ts` at line 32, getStepsByRunId currently returns live objects from the in-memory steps store, allowing tests to mutate persisted state without calling updateStep; change the stubbed getStepsByRunId implementation (the vi.fn for getStepsByRunId) to return cloned step rows (e.g., map matched rows to a deep copy via structuredClone or JSON.parse(JSON.stringify(...))) instead of the original objects from steps.values(), so any mutations in test code won't alter the stored step entries and will force callers to use updateStep to persist changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@packages/sdk/src/workflows/__tests__/workflow-reliability-contract.test.ts`:
- Line 32: getStepsByRunId currently returns live objects from the in-memory
steps store, allowing tests to mutate persisted state without calling
updateStep; change the stubbed getStepsByRunId implementation (the vi.fn for
getStepsByRunId) to return cloned step rows (e.g., map matched rows to a deep
copy via structuredClone or JSON.parse(JSON.stringify(...))) instead of the
original objects from steps.values(), so any mutations in test code won't alter
the stored step entries and will force callers to use updateStep to persist
changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 9ca275b2-c414-40ed-9db1-d8babbf67f6b
📒 Files selected for processing (4)
.trajectories/completed/2026-05/traj_60qc24ufr96g.json.trajectories/completed/2026-05/traj_60qc24ufr96g.md.trajectories/index.jsonpackages/sdk/src/workflows/__tests__/workflow-reliability-contract.test.ts
✅ Files skipped from review due to trivial changes (3)
- .trajectories/completed/2026-05/traj_60qc24ufr96g.json
- .trajectories/completed/2026-05/traj_60qc24ufr96g.md
- .trajectories/index.json
|
Addressed the remaining review feedback in 26a945b:
Validation rerun locally:
|
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
.trajectories/index.json (1)
10-10:⚠️ Potential issue | 🟠 Major | ⚡ Quick winNormalize trajectory
pathvalues to repo-relative paths.Lines like Line 10 and Line 256 embed machine-specific absolute paths, which leak user/environment details and make the index non-portable across dev/CI environments.
Suggested fix pattern
- "path": "/Users/khaliqgant/Projects/AgentWorkforce/relay/.trajectories/completed/2026-05/traj_1775914133873_35667beb.json", + "path": ".trajectories/completed/2026-05/traj_1775914133873_35667beb.json", - "path": "/home/runner/work/relay/relay/.trajectories/completed/2026-04/traj_0t92gxaz6igh.json", + "path": ".trajectories/completed/2026-04/traj_0t92gxaz6igh.json",Apply this normalization consistently to all changed
pathentries in this file.Also applies to: 18-18, 26-26, 34-34, 42-42, 50-50, 58-58, 66-66, 74-74, 82-82, 90-90, 98-98, 106-106, 114-114, 129-129, 137-137, 145-145, 153-153, 161-161, 169-169, 177-177, 185-185, 193-193, 208-208, 216-216, 224-224, 232-232, 240-240, 248-248, 256-256, 263-263, 270-270, 277-277, 284-284
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.trajectories/index.json at line 10, The index.json contains absolute, machine-specific "path" values (e.g., entries like "/Users/khaliqgant/Projects/AgentWorkforce/relay/.trajectories/...") which must be normalized to repo-relative paths; update every "path" value in .trajectories/index.json (all entries referenced in the review) to be relative to the repository root (for example ".trajectories/completed/2026-05/traj_1775914133873_35667beb.json") ensuring the normalization is applied consistently to each changed "path" entry.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@specs/slack-primitive.md`:
- Line 150: Clarify the DM validation behavior for askQuestion: update the
statement about askQuestion against a DM so it consistently says that currently
askQuestion rejects/throws when invoked on a DM unless the workspace has granted
the im:history scope, and that full DM support (including explicit validation/UX
changes) will be implemented in v2; specifically amend the sentence that
currently asserts a hard throw ("askQuestion against a DM throws at validation
time") to state the conditional behavior and/or note that v2 will remove the
restriction, and also adjust the later line that asks whether to support DMs to
reflect this current conditional behavior rather than implying a contradiction
with existing validation. Ensure references to askQuestion, im:history, and
cloud-runtime remain in the text so readers can locate the described behavior.
- Line 280: The "Question idempotency on retry" entry is marked as an open
question but the spec already prescribes stashing (questionTs, runId, stepName)
in the workflow run record before the polling loop; update the "Open questions"
section to either remove that bullet or clarify the exact unresolved item (e.g.,
"Implementation approach defined: stash (questionTs, runId, stepName) in run
record; pending: confirm run-record schema changes and migration strategy"), and
reference the existing resumability guidance and the (questionTs, runId,
stepName) tuple so readers can reconcile the duplicate guidance.
- Around line 93-94: The spec is ambiguous about whether an unknown mention
causes the step to fail; update the wording to state that mentions resolved via
users.lookupByEmail or the user-cache that are not found do not fail the step:
the step completes with a success status but records a typed soft error
SlackPostBackError(unknown_mention) in the step output metadata so downstream
steps can detect it; explicitly state where to find it (e.g.,
stepResult.output.errors or stepResult.metadata.postBackErrors) and that
consumers should check for SlackPostBackError(unknown_mention) to treat it as a
warning rather than a hard failure (same soft-error behavior as
github-primitive).
- Line 52: The spec's scope list is incomplete: add reactions:write and
im:history to the documented scopes and clearly separate "required" (e.g.,
chat:write, channels:history, groups:history) vs "optional/for-DMs/reactions"
(e.g., im:history, reactions:write) in specs/slack-primitive.md, and update the
primitive's scope-validation logic (the validateScopes / first-call scope check)
to enforce only required scopes while warning or listing optional scopes in the
typed error message so missing optional scopes are not treated as fatal.
---
Duplicate comments:
In @.trajectories/index.json:
- Line 10: The index.json contains absolute, machine-specific "path" values
(e.g., entries like
"/Users/khaliqgant/Projects/AgentWorkforce/relay/.trajectories/...") which must
be normalized to repo-relative paths; update every "path" value in
.trajectories/index.json (all entries referenced in the review) to be relative
to the repository root (for example
".trajectories/completed/2026-05/traj_1775914133873_35667beb.json") ensuring the
normalization is applied consistently to each changed "path" entry.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 7fb186db-0fe1-4316-98eb-a321be75261c
📒 Files selected for processing (7)
.trajectories/compacted/compact_j5u7qhaw4q6a_2026-05-08.json.trajectories/compacted/compact_j5u7qhaw4q6a_2026-05-08.md.trajectories/completed/2026-05/traj_vkozdglobkyg.json.trajectories/completed/2026-05/traj_vkozdglobkyg.md.trajectories/index.jsonpackages/sdk/src/workflows/__tests__/workflow-reliability-contract.test.tsspecs/slack-primitive.md
✅ Files skipped from review due to trivial changes (3)
- .trajectories/completed/2026-05/traj_vkozdglobkyg.json
- .trajectories/completed/2026-05/traj_vkozdglobkyg.md
- .trajectories/compacted/compact_j5u7qhaw4q6a_2026-05-08.json
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/sdk/src/workflows/tests/workflow-reliability-contract.test.ts
|
|
||
| Cloud's lambda already wires `Resource.NangoSecretKey.value` and resolves `(workspaceId, provider) → connectionId`. The Slack primitive's `cloud-runtime` reuses that resolver — no new resource binding. | ||
|
|
||
| For Slack, we expect the connection to be a **bot user OAuth token** (`xoxb-*`), not user-token (`xoxp-*`). Posting and reading replies both work with `chat:write`, `channels:history`, and `groups:history` scopes. The primitive validates scopes on first call and throws a typed error early if they're missing. |
There was a problem hiding this comment.
Verify that the scope list is complete.
The spec lists chat:write, channels:history, and groups:history as required scopes. However, line 139 describes adding a :eyes: reaction, which typically requires the reactions:write scope. Additionally, line 276 mentions im:history for potential DM support. Consider explicitly documenting all scopes that may be needed for the full feature set, or clarifying which scopes are required vs. optional for different operations.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@specs/slack-primitive.md` at line 52, The spec's scope list is incomplete:
add reactions:write and im:history to the documented scopes and clearly separate
"required" (e.g., chat:write, channels:history, groups:history) vs
"optional/for-DMs/reactions" (e.g., im:history, reactions:write) in
specs/slack-primitive.md, and update the primitive's scope-validation logic (the
validateScopes / first-call scope check) to enforce only required scopes while
warning or listing optional scopes in the typed error message so missing
optional scopes are not treated as fatal.
|
|
||
| - **Mentions are resolved before send.** `@khaliq` is looked up via `users.lookupByEmail` or the user-cache; if not found, the message still posts but a typed `SlackPostBackError(unknown_mention)` is logged on the step output. This is the same "fail soft on cosmetic errors, fail hard on real errors" pattern as github-primitive. |
There was a problem hiding this comment.
Clarify error-handling semantics for unknown_mention.
Line 94 states that when a mention isn't found, "a typed SlackPostBackError(unknown_mention) is logged on the step output" and "the message still posts." It's unclear whether the step succeeds or fails from the workflow's perspective. Does the step complete successfully with error metadata attached, or does it fail? Please specify the step's final status and how downstream steps can access or check for this soft error.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@specs/slack-primitive.md` around lines 93 - 94, The spec is ambiguous about
whether an unknown mention causes the step to fail; update the wording to state
that mentions resolved via users.lookupByEmail or the user-cache that are not
found do not fail the step: the step completes with a success status but records
a typed soft error SlackPostBackError(unknown_mention) in the step output
metadata so downstream steps can detect it; explicitly state where to find it
(e.g., stepResult.output.errors or stepResult.metadata.postBackErrors) and that
consumers should check for SlackPostBackError(unknown_mention) to treat it as a
warning rather than a hard failure (same soft-error behavior as
github-primitive).
|
|
||
| - **Workflows must be allowed to block on external input.** The runner already supports long-running steps (verification gates, sandbox bootstraps), so this is reusing existing plumbing — not inventing new lifecycle. | ||
| - **The step must be resumable.** If the workflow crashes between posting the question and receiving the answer, the resumed run must find the existing question (by run-id-tagged metadata in the message) and continue waiting from there, not re-ask. Implementation: stash `(questionTs, runId, stepName)` in the workflow run record before the polling loop starts; on resume, look up the row and rejoin the poll. | ||
| - **The channel's history must include the question.** This means cloud-runtime cannot use private DMs (the bot can't read DM history without `im:history` scope and that scope is rarely granted). `askQuestion` against a DM throws at validation time. |
There was a problem hiding this comment.
Resolve the DM validation inconsistency.
Line 150 states "askQuestion against a DM throws at validation time," but line 276 asks "Should askQuestion to a DM be supported when im:history is granted? Probably yes...Defer to v2." These statements contradict each other.
Clarify whether:
- (A) DM usage currently throws an error (as line 150 states), and v2 will add support when the scope is present, or
- (B) DM support is already conditional based on scope availability
Update one of these sections to remove the ambiguity.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@specs/slack-primitive.md` at line 150, Clarify the DM validation behavior for
askQuestion: update the statement about askQuestion against a DM so it
consistently says that currently askQuestion rejects/throws when invoked on a DM
unless the workspace has granted the im:history scope, and that full DM support
(including explicit validation/UX changes) will be implemented in v2;
specifically amend the sentence that currently asserts a hard throw
("askQuestion against a DM throws at validation time") to state the conditional
behavior and/or note that v2 will remove the restriction, and also adjust the
later line that asks whether to support DMs to reflect this current conditional
behavior rather than implying a contradiction with existing validation. Ensure
references to askQuestion, im:history, and cloud-runtime remain in the text so
readers can locate the described behavior.
| - **Slack Connect / shared channels.** The primitive should treat shared channels exactly like internal ones — the bot just needs to be invited. Need to verify Nango's Slack provider exposes them correctly. | ||
| - **Audit trail.** Cloud should write every `askQuestion` exchange to the workflow run record so post-mortems can see what the agent asked and how the human answered. This is straightforward but needs schema work; out of scope for the primitive itself. | ||
| - **Default channel resolution.** If the workflow doesn't specify a channel, should the primitive default to the workspace's "wf-default" channel? I think no — the workflow author should be explicit. But cloud could surface the default as `Resource.SlackDefaultChannel.value` for convenience. | ||
| - **Question idempotency on retry.** When a step retries (e.g. `retries: 2`), the second attempt should _not_ re-ask. The primitive should check the channel for an existing question with the same `(runId, stepName)` tag and resume waiting. Mentioned above under resumability — calling out here as the same mechanism. |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Clarify whether question idempotency is settled or open.
Line 280 lists "Question idempotency on retry" as an open question, noting it's "Mentioned above under resumability — calling out here as the same mechanism." However, line 149 already provides specific implementation guidance: "stash (questionTs, runId, stepName) in the workflow run record before the polling loop starts."
If the implementation approach is already defined, remove this from "Open questions" or clearly state what aspect remains unresolved (e.g., "Implementation approach is defined; pending: confirmation of run-record schema changes").
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@specs/slack-primitive.md` at line 280, The "Question idempotency on retry"
entry is marked as an open question but the spec already prescribes stashing
(questionTs, runId, stepName) in the workflow run record before the polling
loop; update the "Open questions" section to either remove that bullet or
clarify the exact unresolved item (e.g., "Implementation approach defined: stash
(questionTs, runId, stepName) in run record; pending: confirm run-record schema
changes and migration strategy"), and reference the existing resumability
guidance and the (questionTs, runId, stepName) tuple so readers can reconcile
the duplicate guidance.
Summary
repairRetriesso existing workflows do not silently spawn repair agents./githubsurface and clean up review-noted markdown/trajectory textTest Plan
npm --prefix packages/sdk run checknpx vitest run --root packages/sdk --config vitest.config.ts src/__tests__/workflow-runner.test.ts -t "deterministic"npx vitest run --root packages/sdk --config vitest.config.ts src/workflows/__tests__/workflow-reliability-contract.test.tsgit diff --checkNote: the full local
workflow-runner.test.tsfile still hits an existing trajectory harness failure (readCompletedTrajectoryFilereturns null inshould record review completion in trajectory with decision and reason). The deterministic repair tests touched by this PR pass, and CI for the prior commit was green.