Skip to content

[fm] Guard support bundle creation with SitrepGuardedInsert.#10535

Open
mergeconflict wants to merge 17 commits into
mainfrom
mergeconflict/fm-sitrepguardedinsert-support-bundles
Open

[fm] Guard support bundle creation with SitrepGuardedInsert.#10535
mergeconflict wants to merge 17 commits into
mainfrom
mergeconflict/fm-sitrepguardedinsert-support-bundles

Conversation

@mergeconflict

@mergeconflict mergeconflict commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Wire the support bundle resource through SitrepGuardedInsert:

  • impl SitrepGuardedResource for SupportBundle;
  • schema: support_bundle_generation on fm_sitrep and the rendezvous_support_bundle_created marker table (migration fm-bundle-resource-deletion);
  • support_bundle_create's FM path routes through the combinator, surfacing a stale sitrep as Error::Conflict;
  • SitrepBuilder tracks support_bundle_generation, bumping it when the outstanding support-bundle-request set changes; the closed-case carry-forward filter drops fully-satisfied closed cases and keeps those with unsatisfied support bundle requests;
  • fm_rendezvous reads the expected generation and aborts the support bundle loop on a stale mismatch; fm_analysis loads existing markers to drive carry-forward; omdb displays the new status fields and generation.

support_bundle_generation and alert_generation are tracked and guarded independently, so a stale generation for one resource aborts only that resource's rendezvous loop.

Context: #10248, builds on #10564 and #10533.

@mergeconflict mergeconflict self-assigned this Jun 2, 2026
@mergeconflict mergeconflict added the fault-management Everything related to the fault-management initiative (RFD480 and others) label Jun 2, 2026
@mergeconflict mergeconflict force-pushed the mergeconflict/fm-sitrepguardedinsert-support-bundles branch from e15c2c1 to e5d67e2 Compare June 2, 2026 19:32
@mergeconflict mergeconflict force-pushed the mergeconflict/fm-sitrepguardedinsert-alerts branch from 1bc2f60 to 62cc3dd Compare June 2, 2026 19:32
@mergeconflict mergeconflict force-pushed the mergeconflict/fm-sitrepguardedinsert-support-bundles branch from e5d67e2 to 809d7d3 Compare June 2, 2026 19:50
@mergeconflict mergeconflict force-pushed the mergeconflict/fm-sitrepguardedinsert-alerts branch from 62cc3dd to 3d5af68 Compare June 2, 2026 19:50
@AlejandroME AlejandroME added this to the 21 milestone Jun 4, 2026
@mergeconflict mergeconflict force-pushed the mergeconflict/fm-sitrepguardedinsert-support-bundles branch from 809d7d3 to 7128ff2 Compare June 4, 2026 20:56
@mergeconflict mergeconflict force-pushed the mergeconflict/fm-sitrepguardedinsert-alerts branch 2 times, most recently from a1beef9 to e97748d Compare June 5, 2026 16:38
@mergeconflict mergeconflict force-pushed the mergeconflict/fm-sitrepguardedinsert-support-bundles branch from 7128ff2 to 1574845 Compare June 5, 2026 16:38
@mergeconflict mergeconflict force-pushed the mergeconflict/fm-sitrepguardedinsert-alerts branch from e97748d to f3832d4 Compare June 8, 2026 16:31
@mergeconflict mergeconflict force-pushed the mergeconflict/fm-sitrepguardedinsert-support-bundles branch from 1574845 to 4d0f1dd Compare June 8, 2026 16:31
@mergeconflict mergeconflict force-pushed the mergeconflict/fm-sitrepguardedinsert-alerts branch from f3832d4 to cbc506b Compare June 8, 2026 16:41
@mergeconflict mergeconflict force-pushed the mergeconflict/fm-sitrepguardedinsert-support-bundles branch 2 times, most recently from 85edf6b to a3ab6c2 Compare June 8, 2026 16:52
@mergeconflict mergeconflict force-pushed the mergeconflict/fm-sitrepguardedinsert-alerts branch from cbc506b to 59680b0 Compare June 8, 2026 16:52
Add the `SitrepGuardedInsert` Diesel combinator and the
`SitrepGuardedResource` trait: a generic primitive for FM rendezvous to
insert a resource row idempotently and guarded against stale-sitrep
execution.

The combinator wraps a caller-supplied resource INSERT in a single CTE
statement that:

  - aborts (StaleSitrep) unless the executor's expected generation still
    equals the latest sitrep's generation column;
  - short-circuits (AlreadyExists) if a creation marker already exists for
    the resource id;
  - on a successful insert, atomically writes a creation marker, gated by
    `WHERE EXISTS (SELECT 1 FROM new_resource)` so a marker is never
    fabricated for a pre-existing row.

All spliced SQL identifiers come from the trait's `&'static str` consts, so
the query is injection-safe. The result is surfaced as a
`SitrepGuardedInsertOutcome` of Created / AlreadyExists / StaleSitrep.
@mergeconflict mergeconflict force-pushed the mergeconflict/fm-sitrepguardedinsert-alerts branch from 59680b0 to b6f5bfe Compare June 8, 2026 17:57
@mergeconflict mergeconflict force-pushed the mergeconflict/fm-sitrepguardedinsert-support-bundles branch from a3ab6c2 to 26cbbe1 Compare June 8, 2026 17:57
@mergeconflict mergeconflict force-pushed the mergeconflict/fm-sitrepguardedinsert-alerts branch from b6f5bfe to a7d6f65 Compare June 8, 2026 20:08
@mergeconflict mergeconflict force-pushed the mergeconflict/fm-sitrepguardedinsert-support-bundles branch from 26cbbe1 to da05cf5 Compare June 8, 2026 20:08
@mergeconflict mergeconflict force-pushed the mergeconflict/fm-sitrepguardedinsert-alerts branch from a7d6f65 to aee00ec Compare June 8, 2026 20:12
@mergeconflict mergeconflict force-pushed the mergeconflict/fm-sitrepguardedinsert-support-bundles branch from da05cf5 to 064090e Compare June 8, 2026 20:12
Comment thread nexus/tests/integration_tests/support_bundles.rs Outdated
Comment thread nexus/fm/src/analysis_input.rs Outdated
Comment thread schema/crdb/dbinit.sql
mergeconflict added a commit that referenced this pull request Jun 10, 2026
Add the `SitrepGuardedInsert` Diesel combinator and the
`SitrepGuardedResource` trait: a generic primitive for FM rendezvous to
insert a resource row idempotently and guarded against stale-sitrep
execution.

The combinator wraps a caller-supplied resource INSERT in a single CTE
statement that:

- aborts (StaleSitrep) unless the executor's expected generation still
equals the latest sitrep's generation column;
- short-circuits (AlreadyExists) if a creation marker already exists for
the resource id;
  - on a successful insert, atomically writes a creation marker.

The result is surfaced as a `SitrepGuardedInsertOutcome` of `Created` /
`AlreadyExists` / `StaleSitrep`.

Context: #10248. This PR
was previously #10532 but I made a mess of it. This is used in
#10533 and
#10535 which are split out
in hopes of making the review somewhat less miserable.
Wire the alert resource through `SitrepGuardedInsert`:

  - `impl SitrepGuardedResource for Alert`;
  - schema: `alert_generation` on `fm_sitrep` and the
    `rendezvous_alert_created` marker table (migration
    fm-alert-resource-deletion);
  - `alert_create`'s FM path routes through the combinator, surfacing a
    stale sitrep as a monomorphic `Error::Conflict`;
  - `SitrepBuilder` tracks `alert_generation`, bumping it when the
    outstanding alert-request set changes; the closed-case carry-forward
    filter drops fully-satisfied closed cases;
  - fm_rendezvous reads the expected generation and aborts the alert loop
    on a stale mismatch; fm_analysis loads existing markers to drive
    carry-forward; omdb displays the new status fields and generation.
The support-bundle mirror of the preceding alert change: `impl
SitrepGuardedResource for SupportBundle`, the `support_bundle_generation`
column and `rendezvous_support_bundle_created` marker table (migration
fm-bundle-resource-deletion), `support_bundle_create`'s FM path routed
through the combinator inside its transaction, support-bundle generation
tracking and carry-forward, the fm_rendezvous bundle loop, the fm_analysis
bundle-marker lookup, and the omdb display.

`support_bundle_generation` and `alert_generation` are tracked and guarded
independently, so a stale generation for one resource aborts only that
resource's rendezvous loop.
hawkw pushed a commit that referenced this pull request Jun 10, 2026
Add the `SitrepGuardedInsert` Diesel combinator and the
`SitrepGuardedResource` trait: a generic primitive for FM rendezvous to
insert a resource row idempotently and guarded against stale-sitrep
execution.

The combinator wraps a caller-supplied resource INSERT in a single CTE
statement that:

- aborts (StaleSitrep) unless the executor's expected generation still
equals the latest sitrep's generation column;
- short-circuits (AlreadyExists) if a creation marker already exists for
the resource id;
  - on a successful insert, atomically writes a creation marker.

The result is surfaced as a `SitrepGuardedInsertOutcome` of `Created` /
`AlreadyExists` / `StaleSitrep`.

Context: #10248. This PR
was previously #10532 but I made a mess of it. This is used in
#10533 and
#10535 which are split out
in hopes of making the review somewhat less miserable.
@mergeconflict mergeconflict force-pushed the mergeconflict/fm-sitrepguardedinsert-alerts branch from aee00ec to aab93e8 Compare June 10, 2026 21:35
@mergeconflict mergeconflict force-pushed the mergeconflict/fm-sitrepguardedinsert-support-bundles branch from 064090e to 5f46981 Compare June 10, 2026 21:35
Rename first_sitrep_with_alert_request_bumps_generation ->
first_sitrep_with_alert_request_bumps_alert_generation so the alert test
reads clearly alongside the support bundle generation tests added later.
@mergeconflict mergeconflict requested a review from smklein June 11, 2026 16:17
Base automatically changed from mergeconflict/fm-sitrepguardedinsert-alerts to main June 18, 2026 03:59
Comment thread nexus/db-model/src/schema_versions.rs Outdated
Comment on lines +128 to 144
/// For [`SupportBundleProvenance::Fm`] the bundle INSERT is gated by
/// [`SitrepGuardedInsert`], which also writes the
/// `rendezvous_support_bundle_created` marker atomically. The
/// transaction encloses the guarded insert and the data-selection
/// inserts, so a marker is never durable without its data-selection
/// rows. If a bundle with the given id already exists, returns
/// [`Error::ObjectAlreadyExists`]. If the latest sitrep's
/// `support_bundle_generation` has advanced past
/// `expected_support_bundle_generation`, returns [`Error::Conflict`]; the
/// caller should skip the request rather than retrying.
///
/// The free-dataset lookup runs before the generation guard, so a stale
/// rendezvous activation that also finds no free dataset reports
/// [`Error::insufficient_capacity`] rather than [`Error::Conflict`].
/// This only affects status reporting and log noise: the guard still
/// prevents a stale activation from inserting anything.
pub async fn support_bundle_create(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. similarly to the suggestion I left on #10533, I wonder if we might want to refactor this as separate support_bundle_create and fm_rendezvous_support_bundle_create methods, rather than passing in a "provenance" type that tells the method which behavior to do --- I can't really imagine having code that wants to be able to call support_bundle_create with either provenance dynamically; we're only doing one code path or the other, so it feels a bit nicer to me for them to just be separate methods.

with that said, I realize that (unlike alerts) the provenance enum was already here, so 🤷‍♀️

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a reason for keeping the switch-on-provenance thing here, but now I honestly can't remember what it was or whether it was a good reason. I'll think about this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I messed with this a bit and decided to stop messing with it. The direction it was going, I had three separate SupportBundleCreateParams structs: one for support_bundle_create, another for fm_rendezvous_support_bundle_create, and a third for their shared helper function (which had some super fun trait bounds for the insert query closure)... It was getting ugly, and then my efforts to de-uglify all pointed in the direction of "hey wouldn't it be nice to just switch on provenance."

Comment on lines +316 to +319
return external::Error::conflict(
"cannot create FM-originated support \
bundle for stale sitrep",
);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to what we did in #10533, I think that if we're going to split the FM and user-requested support bundle creation code paths into separate functions, it would be nicer to just have the FM one return an error enum that represents these specific error cases more explicitly, rather than an external::Error.

I was going to suggest that we could take the FmRendezvousAlertCreateError type added in #10533 and rename it to FmRendezvousCreateError or something, and remove the string "alert" from the docs/formatted output (we could say the generic "resource" or something instead). But, I realized that support bundle creation also needs to be able to represent "insufficient capacity". I suppose we could do something like

enum FmRendezvousSupportBundleCreateError {
    #[error("insufficient capacity for blah blah blah whatever")]
    InsufficientCapacity,
    #[error(transparent)]
    Create(#[from] FmRendezvousCreateError),
}

or something? I dunno if that's really worth it over just having separate error types...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, see above... I agree in principle but it's a bit of a pain here in practice, so I'm punting on this.

Comment thread nexus/db-queries/src/db/sitrep_guard.rs Outdated
Comment thread nexus/fm/src/analysis_input.rs Outdated
Comment thread nexus/fm/src/builder.rs Outdated
Comment thread nexus/fm/src/builder.rs Outdated
Comment thread nexus/fm/src/builder.rs Outdated
Comment thread nexus/fm/src/builder.rs Outdated
Comment thread nexus/fm/src/builder.rs Outdated
@mergeconflict mergeconflict requested a review from hawkw June 18, 2026 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fault-management Everything related to the fault-management initiative (RFD480 and others)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants