Skip to content

[No QA] Add SEQUENTIAL_QUEUE.md#93279

Open
adhorodyski wants to merge 12 commits into
Expensify:mainfrom
callstack-internal:sq-audit
Open

[No QA] Add SEQUENTIAL_QUEUE.md#93279
adhorodyski wants to merge 12 commits into
Expensify:mainfrom
callstack-internal:sq-audit

Conversation

@adhorodyski

@adhorodyski adhorodyski commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

@mountiny

Explanation of Change

Adds contributingGuides/SEQUENTIAL_QUEUE.md, an observational reference document for the SequentialQueue subsystem — the offline-first WRITE queue that serialises every mutating API call.

Documentation-only: the new doc plus one cspell.json word entry. No runtime code changes.

The doc mirrors the style and scope of the sibling NETWORK_STATE_DETECTION.md (problem → how it works today → sharp edges per block) and refers to code by module and function names, not line numbers, so it survives refactors. It covers:

  • The coordinator (SequentialQueue) and its 7-variable state machine
  • The lifecycle of one request, and restart-recovery of a persisted queue
  • PersistedRequests — the durable store and its in-memory-authoritative inversion pattern
  • Where a request actually hits disk: push() awaits the Onyx disk commit before flushing, the remaining fire-and-forget persist windows, and the null/MemoryOnlyProvider/retry caveats on the commit handle
  • RequestThrottle — jittered exponential back-off and the give-up signal
  • QueuedOnyxUpdates / queueFlushedData — the two anti-flicker buffers
  • API.write / API.read / makeRequestWithSideEffects public contract and the inbound consumers that drive the queue
  • The error-handling ladder (per error class: retries, Onyx data applied, modal)
  • Offline behaviour, pause / data-gap sync, multi-tab leader election
  • The middleware chain boundary and conflict resolution (the twice-evaluated resolver)
  • Test coverage as evidence of the intended contract

Fixed Issues

$ #93422
PROPOSAL:

Tests

  1. Render contributingGuides/SEQUENTIAL_QUEUE.md in a Markdown viewer (GitHub preview, VS Code, etc.) and verify headings, code blocks, and internal anchor links render correctly.
  2. Confirm no broken relative links (the doc references sibling files in contributingGuides/ that already exist).
  • Verify that no errors appear in the JS console

Offline tests

N/A

QA Steps

Same as tests

  • Verify that no errors appear in the JS console

PR Author Checklist

  • I linked the correct issue in the ### Fixed Issues section above
  • I wrote clear testing steps that cover the changes made in this PR
    • I added steps for local testing in the Tests section
    • I added steps for the expected offline behavior in the Offline steps section
    • I added steps for Staging and/or Production testing in the QA steps section
    • I added steps to cover failure scenarios (i.e. verify an input displays the correct error message if the entered data is not correct)
    • I turned off my network connection and tested it while offline to ensure it matches the expected behavior (i.e. verify the default avatar icon is displayed if app is offline)
    • I tested this PR with a High Traffic account against the staging or production API to ensure there are no regressions (e.g. long loading states that impact usability).
  • I included screenshots or videos for tests on all platforms
  • I ran the tests on all platforms & verified they passed on:
    • Android: Native
    • Android: mWeb Chrome
    • iOS: Native
    • iOS: mWeb Safari
    • MacOS: Chrome / Safari
  • I verified there are no console errors (if there's a console error not related to the PR, report it or open an issue for it to be fixed)
  • I followed proper code patterns (see Reviewing the code)
    • I verified that any callback methods that were added or modified are named for what the method does and never what callback they handle (i.e. toggleReport and not onIconClick)
    • I verified that comments were added to code that is not self explanatory
    • I verified that any new or modified comments were clear, correct English, and explained "why" the code was doing something instead of only explaining "what" the code was doing.
    • I verified any copy / text shown in the product is localized by adding it to src/languages/* files and using the translation method
    • I verified all numbers, amounts, dates and phone numbers shown in the product are using the localization methods
    • I verified any copy / text that was added to the app is grammatically correct in English. It adheres to proper capitalization guidelines (note: only the first word of header/labels should be capitalized), and is either coming verbatim from figma or has been approved by marketing (in order to get marketing approval, ask the Bug Zero team member to add the Waiting for copy label to the issue)
    • I verified proper file naming conventions were followed for any new files or renamed files. All non-platform specific files are named after what they export and are not named "index.js". All platform-specific files are named for the platform the code supports as outlined in the README.
    • I verified the JSDocs style guidelines (in STYLE.md) were followed
  • If a new code pattern is added I verified it was agreed to be used by multiple Expensify engineers
  • I followed the guidelines as stated in the Review Guidelines
  • I tested other components that can be impacted by my changes (i.e. if the PR modifies a shared library or component like Avatar, I verified the components using Avatar are working as expected)
  • I verified all code is DRY (the PR doesn't include any logic written more than once, with the exception of tests)
  • I verified any variables that can be defined as constants (ie. in CONST.ts or at the top of the file that uses the constant) are defined as such
  • I verified that if a function's arguments changed that all usages have also been updated correctly
  • If any new file was added I verified that:
    • The file has a description of what it does and/or why is needed at the top of the file if the code is not self explanatory
  • If a new CSS style is added I verified that:
    • A similar style doesn't already exist
    • The style can't be created with an existing StyleUtils function (i.e. StyleUtils.getBackgroundAndBorderStyle(theme.componentBG))
  • If new assets were added or existing ones were modified, I verified that:
    • The assets are optimized and compressed (for SVG files, run npm run compress-svg)
    • The assets load correctly across all supported platforms.
  • If the PR modifies code that runs when editing or sending messages, I tested and verified there is no unexpected behavior for all supported markdown - URLs, single line code, code blocks, quotes, headings, bold, strikethrough, and italic.
  • If the PR modifies a generic component, I tested and verified that those changes do not break usages of that component in the rest of the App (i.e. if a shared library or component like Avatar is modified, I verified that Avatar is working as expected in all cases)
  • If the PR modifies a component related to any of the existing Storybook stories, I tested and verified all stories for that component are still working as expected.
  • If the PR modifies a component or page that can be accessed by a direct deeplink, I verified that the code functions as expected when the deeplink is used - from a logged in and logged out account.
  • If the PR modifies the UI (e.g. new buttons, new UI components, changing the padding/spacing/sizing, moving components, etc) or modifies the form input styles:
    • I verified that all the inputs inside a form are aligned with each other.
    • I added Design label and/or tagged @Expensify/design so the design team can review the changes.
  • If a new page is added, I verified it's using the ScrollView component to make it scrollable when more elements are added to the page.
  • I added unit tests for any new feature or bug fix in this PR to help automatically prevent regressions in this user flow.
  • If the main branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to the Test steps.

Screenshots/Videos

Android: Native
Android: mWeb Chrome
iOS: Native
iOS: mWeb Safari
MacOS: Chrome / Safari

@adhorodyski adhorodyski changed the title [No QA] Add contributingGuides/SEQUENTIAL_QUEUE.md — current-state reference for the offline request queue [No QA] Add contributingGuides/SEQUENTIAL_QUEUE.md Jun 11, 2026
@adhorodyski adhorodyski changed the title [No QA] Add contributingGuides/SEQUENTIAL_QUEUE.md [No QA] Add SEQUENTIAL_QUEUE.md Jun 11, 2026
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@adhorodyski adhorodyski marked this pull request as ready for review June 11, 2026 14:04
@adhorodyski adhorodyski requested a review from a team as a code owner June 11, 2026 14:04
@melvin-bot melvin-bot Bot requested review from arosiclair and removed request for a team June 11, 2026 14:04
@melvin-bot

melvin-bot Bot commented Jun 11, 2026

Copy link
Copy Markdown

@arosiclair Please copy/paste the Reviewer Checklist from here into a new comment on this PR and complete it. If you have the K2 extension, you can simply click: [this button]

@mountiny mountiny self-requested a review June 12, 2026 11:49
@adhorodyski

Copy link
Copy Markdown
Contributor Author

@codex review

@mountiny mountiny requested a review from Copilot June 12, 2026 11:49
@mountiny

Copy link
Copy Markdown
Contributor

Reviewer Checklist

  • I have verified the author checklist is complete (all boxes are checked off).
  • I verified the correct issue is linked in the ### Fixed Issues section above
  • I verified testing steps are clear and they cover the changes made in this PR
    • I verified the steps for local testing are in the Tests section
    • I verified the steps for Staging and/or Production testing are in the QA steps section
    • I verified the steps cover any possible failure scenarios (i.e. verify an input displays the correct error message if the entered data is not correct)
    • I turned off my network connection and tested it while offline to ensure it matches the expected behavior (i.e. verify the default avatar icon is displayed if app is offline)
  • I checked that screenshots or videos are included for tests on all platforms
  • I included screenshots or videos for tests on all platforms
  • I verified that the composer does not automatically focus or open the keyboard on mobile unless explicitly intended. This includes checking that returning the app from the background does not unexpectedly open the keyboard.
  • I verified tests pass on all platforms & I tested again on:
    • Android: HybridApp
    • Android: mWeb Chrome
    • iOS: HybridApp
    • iOS: mWeb Safari
    • MacOS: Chrome / Safari
  • If there are any errors in the console that are unrelated to this PR, I either fixed them (preferred) or linked to where I reported them in Slack
  • I verified proper code patterns were followed (see Reviewing the code)
    • I verified that any callback methods that were added or modified are named for what the method does and never what callback they handle (i.e. toggleReport and not onIconClick).
    • I verified that comments were added to code that is not self explanatory
    • I verified that any new or modified comments were clear, correct English, and explained "why" the code was doing something instead of only explaining "what" the code was doing.
    • I verified any copy / text shown in the product is localized by adding it to src/languages/* files and using the translation method
    • I verified all numbers, amounts, dates and phone numbers shown in the product are using the localization methods
    • I verified any copy / text that was added to the app is grammatically correct in English. It adheres to proper capitalization guidelines (note: only the first word of header/labels should be capitalized), and is either coming verbatim from figma or has been approved by marketing (in order to get marketing approval, ask the Bug Zero team member to add the Waiting for copy label to the issue)
    • I verified proper file naming conventions were followed for any new files or renamed files. All non-platform specific files are named after what they export and are not named "index.js". All platform-specific files are named for the platform the code supports as outlined in the README.
    • I verified the JSDocs style guidelines (in STYLE.md) were followed
  • If a new code pattern is added I verified it was agreed to be used by multiple Expensify engineers
  • I verified that this PR follows the guidelines as stated in the Review Guidelines
  • I verified other components that can be impacted by these changes have been tested, and I retested again (i.e. if the PR modifies a shared library or component like Avatar, I verified the components using Avatar have been tested & I retested again)
  • I verified all code is DRY (the PR doesn't include any logic written more than once, with the exception of tests)
  • I verified any variables that can be defined as constants (ie. in CONST.ts or at the top of the file that uses the constant) are defined as such
  • If a new component is created I verified that:
    • A similar component doesn't exist in the codebase
    • All props are defined accurately and each prop has a /** comment above it */
    • The file is named correctly
    • The component has a clear name that is non-ambiguous and the purpose of the component can be inferred from the name alone
    • The only data being stored in the state is data necessary for rendering and nothing else
    • For Class Components, any internal methods passed to components event handlers are bound to this properly so there are no scoping issues (i.e. for onClick={this.submit} the method this.submit should be bound to this in the constructor)
    • Any internal methods bound to this are necessary to be bound (i.e. avoid this.submit = this.submit.bind(this); if this.submit is never passed to a component event handler like onClick)
    • All JSX used for rendering exists in the render method
    • The component has the minimum amount of code necessary for its purpose, and it is broken down into smaller components in order to separate concerns and functions
  • If any new file was added I verified that:
    • The file has a description of what it does and/or why is needed at the top of the file if the code is not self explanatory
  • If a new CSS style is added I verified that:
    • A similar style doesn't already exist
    • The style can't be created with an existing StyleUtils function (i.e. StyleUtils.getBackgroundAndBorderStyle(theme.componentBG)
  • If the PR modifies code that runs when editing or sending messages, I tested and verified there is no unexpected behavior for all supported markdown - URLs, single line code, code blocks, quotes, headings, bold, strikethrough, and italic.
  • If the PR modifies a generic component, I tested and verified that those changes do not break usages of that component in the rest of the App (i.e. if a shared library or component like Avatar is modified, I verified that Avatar is working as expected in all cases)
  • If the PR modifies a component related to any of the existing Storybook stories, I tested and verified all stories for that component are still working as expected.
  • If the PR modifies a component or page that can be accessed by a direct deeplink, I verified that the code functions as expected when the deeplink is used - from a logged in and logged out account.
  • If the PR modifies the UI (e.g. new buttons, new UI components, changing the padding/spacing/sizing, moving components, etc) or modifies the form input styles:
    • I verified that all the inputs inside a form are aligned with each other.
    • I added Design label and/or tagged @Expensify/design so the design team can review the changes.
  • If a new page is added, I verified it's using the ScrollView component to make it scrollable when more elements are added to the page.
  • For any bug fix or new feature in this PR, I verified that sufficient unit tests are included to prevent regressions in this flow.
  • If the main branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to the Test steps.
  • I have checked off every checkbox in the PR reviewer checklist, including those that don't apply to this PR.

Screenshots/Videos

Android: HybridApp
Android: mWeb Chrome
iOS: HybridApp
iOS: mWeb Safari
MacOS: Chrome / Safari

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an internal reference document describing the SequentialQueue subsystem (offline-first serialized WRITE queue), intended as a sibling to existing architecture docs like NETWORK_STATE_DETECTION.md.

Changes:

  • Added contributingGuides/SEQUENTIAL_QUEUE.md documenting SequentialQueue architecture, lifecycle, collaborators, edge cases, and test “contract” references.
  • Updated cspell.json to whitelist new technical terms used by the documentation (beforeunload, reauthenticating).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
cspell.json Adds new allowed terms needed for the new guide’s vocabulary.
contributingGuides/SEQUENTIAL_QUEUE.md New in-depth subsystem documentation for SequentialQueue behavior, boundaries, and known sharp edges/open questions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

Reviewed commit: 98675633f8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@adhorodyski

Copy link
Copy Markdown
Contributor Author

@MelvinBot review this PR for being true to reality. Can you confirm all the claims from the new document?

@adhorodyski

Copy link
Copy Markdown
Contributor Author

Let me also update the PR since #91734 got merged over the weekend.

adhorodyski and others added 3 commits June 15, 2026 14:28
The QueuedOnyxUpdates not-signed-in filter preserves 15 keys, not 14.
Also name the focus-mode NVP and the three RAM-only flags explicitly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR Expensify#91734 made push() async and await the Onyx disk commit before
flushing, closing the enqueue crash window the doc centred on. Rewrite
the affected sections to current reality:

- push() awaits persistencePromise before flush(false); enqueue write is
  now gated ahead of the network (was: fired off in-memory state)
- redraw the disk/network ordering diagram (commit precedes network)
- isReadyPromise mechanism: starts resolved, armed via
  setIsReadyPromisePending(); add isReadyPromisePending (7 state vars)
- four resolveIsReadyPromise sites (finally + all-empty/not-leader guards
  + offline-during-persist in push), not just the finally
- persist .catch handlers now Log.alert (storage emergency)
- note new SequentialQueueTest coverage; clarify persist-before-fire vs
  persist-before-optimistic (the latter still open)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@adhorodyski

adhorodyski commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

@mountiny as you can read in the PR explanation there are some open questions for now that I couldn't address yet/guess:

An Open Questions (needs maintainer confirmation) appendix with 5 intent-ambiguous items

@mountiny

Copy link
Copy Markdown
Contributor

@MelvinBot Can you do a thorough review of this PR?

@mountiny

Copy link
Copy Markdown
Contributor

@codex review

@MelvinBot

Copy link
Copy Markdown
Contributor

Review — [No QA] Add SEQUENTIAL_QUEUE.md

Docs-only (one new reference doc + one cspell.json word). No runtime code, so the only real risk is accuracy. I fact-checked the doc's substantive claims against the source on this branch (99ac549) — async ordering, the state machine, middleware registration order, the error ladder, exact constants, the preserved-keys allowlist, the "dead code" / "stale comment" observations, and the test-coverage map.

Verdict: accurate and genuinely useful. ~60 specific claims checked; all but one verify cleanly, including subtle ones (the 8-vs-7 state-var grouping, the four resolveIsReadyPromise sites, the three-legged all-empty guard, clear()'s untracked bare ongoing-key write, the exact 15-key allowlist count, and that persistWhenOngoing is never assigned in production). The Open Questions appendix holds up — the dead SaveResponseInOnyx guard, the stale "must be last" comment (API/index.ts:53 with FraudMonitoring added after), and the vestigial persistWhenOngoing are all real and worth a maintainer's eyes.

One correction needed

getLength() does not gate read-blocking. The doc says it twice — line 175 ("API.index reads this to decide whether a READ must wait for writes") and line 325 ("getLength() (read-only) to decide read-blocking"). In reality waitForWrites always returns waitForSequentialQueueIdle(); getLength() only feeds a log line:

function waitForWrites(command) {
    if (getPersistedRequestsLength() > 0) {
        Log.info(`[API] '${command}' is waiting on ${getPersistedRequestsLength()} write commands`);
    }
    return waitForSequentialQueueIdle();   // <-- always awaited, regardless of length
}

src/libs/API/index.ts:239-243

The actual read-blocking decision is made by isReadyPromise (it starts resolved, so a read proceeds immediately when no write is pending) — which the doc already explains well elsewhere. Suggest rewording both spots so getLength() isn't presented as the gate. The "counts the ongoing request → getLength() === 0 means truly idle" half is correct and worth keeping.

Optional nitpicks (non-blocking)

  • Line 514 / line 196 etc. describe Reauthentication as throwing 'Failed to reauthenticate'. It throws new Error('Failed to reauthenticate') (Middleware/Reauthentication.ts:117) — the string is the message. Fine as shorthand; flagging only for completeness.
  • Sibling links (NETWORK_STATE_DETECTION.md, philosophies/OFFLINE.md) both resolve. 👍

Nice work — this is a high-fidelity reference. Happy to approve once the getLength() wording is tweaked.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99ac5490e6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread contributingGuides/SEQUENTIAL_QUEUE.md Outdated
Comment thread contributingGuides/SEQUENTIAL_QUEUE.md Outdated
- getLength() does not gate read-blocking: drop the claim; waitForWrites
  always awaits waitForSequentialQueueIdle() (isReadyPromise) regardless
  of count (MelvinBot)
- Reauthentication throws new Error('Failed to reauthenticate') (MelvinBot)
- Qualify persist-before-fire to post-init only; pre-init save() returns
  Promise.resolve() and the enqueue write is deferred + un-awaited (Codex P2)
- isQueuePaused is a data-gap/deferred-update pause only, not offline;
  offline is a separate isOfflineNetwork() check. Add Open Question #6 on
  the misleading source comment (Codex P3)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@adhorodyski

Copy link
Copy Markdown
Contributor Author

@MelvinBot thanks — both confirmed and fixed in 31d1f51:

  • getLength() read-blocking: removed the claim and corrected the API.index consumer entry — it doesn't gate reads; waitForWrites always awaits waitForSequentialQueueIdle() (isReadyPromise) regardless of the count.
  • Reauthentication: now reads new Error('Failed to reauthenticate') in both spots.

@adhorodyski

Copy link
Copy Markdown
Contributor Author

I think there are now more and more open questions in the doc that need to be addressed before merging.

adhorodyski and others added 6 commits June 16, 2026 13:35
Fold the persistWhenOngoing, dead !onyxUpdates guard term, stale
"must be last" comment, and inaccurate isQueuePaused offline comment into
their relevant blocks as present-tense facts. Open Questions now holds only
the two items that genuinely need a maintainer to ratify: silent give-up
data loss, and knownOngoingRequestIDs sufficiency.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both remaining items were "the code clearly shows what happens; only
whether it's intended was unclear." Under document-AS-IS, that intent
speculation comes out: the behaviors now stand as plain observations in
the Error Handling and PersistedRequests sharp edges. Drop the section
and its inbound references (intro, Contents, two inline links).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove the "Relationship to Other Docs" section (the Overview already links
the two sibling docs), slim "Key Modules Reference" to a terse code-location
index with section links, and simplify the architecture diagram to a
high-level map (drop middleware names, guard-ordering lists, and the
disk-write legend that the disk section already covers).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@adhorodyski

Copy link
Copy Markdown
Contributor Author

Resolved all threads I've been interested in, the doc reads well for me.

@OlimpiaZurek

Copy link
Copy Markdown
Contributor

LGTM, just a few more Codex findings:

[P2] SEQUENTIAL_QUEUE.md#L5 overstates the delivery guarantee. The overview says every change is delivered “once” / exactly once, but the doc itself later describes duplicate-send windows during mid-flush leadership changes and crash/retry edges. The code also relies on retry/re-drive behavior, not a client-side exactly-once lock. I’d rephrase this as “sent serially and deduplicated/reconciled where needed” rather than “delivered once.”
Source: SEQUENTIAL_QUEUE.md#L5, local evidence around promotion/retry in PersistedRequests.ts#L444.

[P2] SEQUENTIAL_QUEUE.md#L92, #L132, and #L329 describe push() arming isReadyPromise before all offline early-outs, but the code does not. In push(), if the app is already offline, it awaits persistence and returns before setIsReadyPromisePending() is called. The doc should distinguish “offline at entry” from “goes offline during the awaited persist.” Otherwise readers may think offline writes arm and release the READ gate, when they actually leave it untouched.
Source: SequentialQueue.ts#L572.

[P3] SEQUENTIAL_QUEUE.md#L105 and #L147 conflict on whether a crash can leave the same request both queued and ongoing. The restart section says that state can happen and explains head dedupe; the PersistedRequests section says the atomic multiSet prevents it. If dedupe is only defensive/legacy protection, the doc should say that. If the duplicate state is still possible, line 147 should be softened.
Source: SEQUENTIAL_QUEUE.md#L105 and SEQUENTIAL_QUEUE.md#L147.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants