Skip to content

Handle missing live high-watermark route#325

Merged
khaliqgant merged 1 commit into
mainfrom
ar-249-live-high-watermark-404-fallback
Jun 14, 2026
Merged

Handle missing live high-watermark route#325
khaliqgant merged 1 commit into
mainfrom
ar-249-live-high-watermark-404-fallback

Conversation

@khaliqgant

@khaliqgant khaliqgant commented Jun 14, 2026

Copy link
Copy Markdown
Member

Summary

  • detect 404/route-not-found failures from the live event high-watermark lookup
  • run the existing ready issue full-pull/reconcile path before opening the live event stream when that route is unavailable
  • cover fallback, no-fallback with a present watermark, and no re-dispatch after startup reconciliation

Tests

  • npx vitest run packages/factory-sdk/src/orchestrator/factory.test.ts
  • npx vitest run packages/factory-sdk/src
  • npm run typecheck

Review in cubic

@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

Pull request was closed or merged during review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 351d1cf4-4ad2-4026-a217-833deaf12809

📥 Commits

Reviewing files that changed from the base of the PR and between 7d7e835 and 33f1820.

📒 Files selected for processing (2)
  • packages/factory-sdk/src/orchestrator/factory.test.ts
  • packages/factory-sdk/src/orchestrator/factory.ts

📝 Walkthrough

Walkthrough

FactoryLoop live-subscription startup now detects when the event high-watermark route is unavailable (HTTP 404 or route_not_found). A new EventHighWatermarkResult type and isHighWatermarkRouteUnavailable helper drive a conditional fallback to runOnce() full pull. Three new tests cover the fallback path, the skip path, and duplicate-suppression behavior.

Changes

Live subscription high-watermark route-unavailable fallback

Layer / File(s) Summary
EventHighWatermarkResult type and route-unavailable classifier
packages/factory-sdk/src/orchestrator/factory.ts
Adds the internal EventHighWatermarkResult type and isHighWatermarkRouteUnavailable function that checks HTTP 404 status codes, details.code === 'route_not_found', and error message string matching to classify whether a high-watermark fetch failure means the route is missing.
#currentEventHighWatermark refactor and #startLiveSubscription fallback
packages/factory-sdk/src/orchestrator/factory.ts
Reworks #currentEventHighWatermark to return EventHighWatermarkResult (incrementing a counter and computing routeUnavailable on error). Updates #startLiveSubscription to execute runOnce() and refresh the live heartbeat when routeUnavailable is true.
Test scaffolding and three new live-subscribe tests
packages/factory-sdk/src/orchestrator/factory.test.ts
Adds ThrowingWatermarkMount, CountingListTreeMount, and RouteNotFoundCountingListTreeMount. Adds three tests: route-unavailable triggers startup full pull with correct listTree prefixes and counter increments; route-available skips the full pull; startup-dispatched issue is not re-dispatched on a later live event for the same issue.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • AgentWorkforce/pear#276: Modifies #startLiveSubscription and live heartbeat refresh logic in factory.ts, which the high-watermark fallback path in this PR calls directly after runOnce().
  • AgentWorkforce/pear#245: Introduces the live-mode startup plumbing in FactoryLoop and corresponding factory.test.ts coverage that this PR extends with the route-unavailable fallback.

Poem

🐇 Hoppity-hop through the watermark gate,
If the route's not there, no reason to wait!
A full pull I'll fetch, with prefixes in tow,
Duplicate issues? Oh no, let them go!
The counters tick up, the heartbeat refreshed,
This bunny's live feed is thoroughly blessed. 🌿

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: handling the case when the live high-watermark route is missing/unavailable.
Description check ✅ Passed The description is directly related to the changeset, providing clear context about detecting route failures, implementing fallback logic, and covering the test scenarios.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ar-249-live-high-watermark-404-fallback

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint install failed due to a network error.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a fallback mechanism in the factory orchestrator to run a startup full pull when the high-watermark route is unavailable. It introduces a helper function isHighWatermarkRouteUnavailable to detect route-not-found errors and adds corresponding unit tests. Feedback suggests making the error detection helper more robust by checking nested response data and safely inspecting error messages.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +2708 to +2722
const isHighWatermarkRouteUnavailable = (error: unknown): boolean => {
const details = asRecord(error) ?? {}
const response = asRecord(details.response) ?? {}
const status = details.status ?? details.statusCode ?? response.status ?? response.statusCode
if (status === 404 || status === '404') {
return true
}

const code = stringValue(details.code)?.toLowerCase()
if (code === 'route_not_found') {
return true
}

return error instanceof Error && /route not found/i.test(error.message)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The isHighWatermarkRouteUnavailable helper checks for details.code and error.message to identify route-not-found errors. However, in many HTTP clients (like Axios), server-returned error details (such as custom error codes or messages) are nested inside the response body (i.e., response.data). Additionally, if an error is serialized/deserialized across process boundaries, it may lose its Error prototype, causing error instanceof Error to evaluate to false.

To make this check more robust, we should also inspect response.data for code and message, and safely check for message on plain objects.

const isHighWatermarkRouteUnavailable = (error: unknown): boolean => {
  const details = asRecord(error) ?? {}
  const response = asRecord(details.response) ?? {}
  const data = asRecord(response.data) ?? {}
  const status = details.status ?? details.statusCode ?? response.status ?? response.statusCode
  if (status === 404 || status === '404') {
    return true
  }

  const code = (stringValue(details.code) ?? stringValue(data.code))?.toLowerCase()
  if (code === 'route_not_found') {
    return true
  }

  const message = stringValue(details.message) ?? stringValue(data.message) ?? (error instanceof Error ? error.message : '')
  return /route not found/i.test(message)
}

@agent-relay-code

Copy link
Copy Markdown
Contributor

pr-reviewer could not complete review for #325 in AgentWorkforce/pear.
The review harness exited with code 1.
No review was posted; this needs operator attention.

@khaliqgant khaliqgant closed this Jun 14, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 33f1820d4b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if (highWatermark.routeUnavailable) {
this.#increment('liveHighWatermarkFullPullFallbacks')
this.#logger.info?.('[factory] live subscription high-watermark route unavailable; running startup full pull before event stream')
await this.runOnce()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Close the startup blind spot before the fallback pull

When the high-watermark route is unavailable, this runs the full pull before any live subscription is registered. If an issue becomes Ready during runOnce() after its paths have already been listed, its change event is emitted while no listener exists, and the later { from: 'now' } subscription starts after that event, so neither the pull nor the stream dispatches it. Register the stream or otherwise capture a cursor before doing the fallback pull so live startup has no event-loss window.

Useful? React with 👍 / 👎.

@agent-relay-code

Copy link
Copy Markdown
Contributor

pr-reviewer could not complete review for #325 in AgentWorkforce/pear.
The review harness exited with code 1.
No review was posted; this needs operator attention.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

Open in Devin Review

Comment on lines +333 to +338
if (highWatermark.routeUnavailable) {
this.#increment('liveHighWatermarkFullPullFallbacks')
this.#logger.info?.('[factory] live subscription high-watermark route unavailable; running startup full pull before event stream')
await this.runOnce()
await this.#refreshLiveHeartbeatIfDue()
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 runOnce() exceptions during startup full pull propagate and abort factory start

When the high-watermark route is unavailable (404), runOnce() is called at factory.ts:336 without a try-catch. Unlike #handleChange() (which wraps dispatch errors per-issue at factory.ts:1062), runOnce() lets dispatch() exceptions propagate at factory.ts:801. If any single issue fails to dispatch (e.g., spawn failure), the entire runOnce() throws, which propagates through #startLiveSubscription() up to #start() at factory.ts:243, setting this.#started = false and aborting the factory start. In the runLoop context, this is handled by retry logic (factory.ts:829-844), but in the live startup context there is no retry — the caller must re-invoke start(). This is consistent with how #start() handles other failures (e.g. ensureSubRoot), but it does mean a single transient dispatch failure during the 404-fallback full pull prevents the factory from starting, whereas the same failure in normal live mode (without the 404 fallback) would never occur because runOnce is never called.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@agent-relay-code

Copy link
Copy Markdown
Contributor

pr-reviewer could not complete review for #325 in AgentWorkforce/pear.
The review harness exited with code 1.
No review was posted; this needs operator attention.

1 similar comment
@agent-relay-code

Copy link
Copy Markdown
Contributor

pr-reviewer could not complete review for #325 in AgentWorkforce/pear.
The review harness exited with code 1.
No review was posted; this needs operator attention.

@khaliqgant khaliqgant reopened this Jun 14, 2026
@agent-relay-code

Copy link
Copy Markdown
Contributor

pr-reviewer could not complete review for #325 in AgentWorkforce/pear.
The review harness exited with code 1.
No review was posted; this needs operator attention.

@khaliqgant khaliqgant merged commit b8802be into main Jun 14, 2026
9 checks passed
@khaliqgant khaliqgant deleted the ar-249-live-high-watermark-404-fallback branch June 14, 2026 09:49
@agent-relay-code

Copy link
Copy Markdown
Contributor

pr-reviewer could not complete review for #325 in AgentWorkforce/pear.
The review harness exited with code 1.
No review was posted; this needs operator attention.

khaliqgant added a commit that referenced this pull request Jun 14, 2026
Two P1s flagged on #325's startup full-pull fallback (now in main):

1. Unguarded startup runOnce() aborted daemon start. runOnce() calls
   #readyIssuePaths() unguarded, so a transient pull/listTree failure
   propagated out of #startLiveSubscription and killed `factory start` — a
   startup-resilience regression vs. the prior "watermark undefined -> continue".
   Now wrapped: on failure, increment liveHighWatermarkFullPullErrors, log via
   #error, and degrade to the live stream instead of going down.

2. Pull-before-subscribe blind spot. The full pull ran before the subscription
   registered, so an issue going Ready *during* the pull emitted an event with
   no listener and was lost. Now the subscription registers BEFORE the pull;
   events buffer via a new #deferLiveEventDrain gate and drain once the pull
   completes. Batch dedupe suppresses overlap with issues the pull dispatched.

Tests: startup-pull-throws keeps the daemon up (start resolves + counter); an
issue arriving via a live event mid-pull is captured and dispatched. Full
factory-sdk suite green (339), typecheck clean.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant