Skip to content

fix(tracing): finish active trace on crash#1667

Merged
jpnurmi merged 22 commits into
masterfrom
jpnurmi/fix/trace-finish
May 26, 2026
Merged

fix(tracing): finish active trace on crash#1667
jpnurmi merged 22 commits into
masterfrom
jpnurmi/fix/trace-finish

Conversation

@jpnurmi
Copy link
Copy Markdown
Collaborator

@jpnurmi jpnurmi commented Apr 23, 2026

Before, an in-flight trace was silently dropped on crash:

// TODO(tracing): Revisit when investigating transaction flushing
// during hard crashes.

Even with a (hypothetical) manual finish in the app right before a crash, the crash event still "floated" at the trace root rather than nested under the active span.

This PR auto-finishes the active span/transaction on crash with status aborted, and preserves scope->span / scope->transaction_object across the finish, so the trace reaches Sentry and the crash nests under it.

❌ Before (not finished, trace lost) ❌ Before (manual in-app force-finish) ✅ After (auto-finish)
image image image

For comparison:

Android example
private fun crashWithNestedTrace() {
  val transaction =
    Sentry.startTransaction(
      "sentry-android.finish-on-crash",
      "demo.crash",
      TransactionOptions().apply { isBindToScope = true },
    )
  transaction.setTag("trace_finish_demo", "android")
  transaction.setData("expected", "sentry-android force-finishes this transaction on crash")

  fun openSpan(parent: ISpan, operation: String, description: String, delayMs: Long): ISpan {
    val span = parent.startChild(operation, description)
    SystemClock.sleep(delayMs)
    return span
  }

  val screen = openSpan(transaction, "ui.screen", "ErrorsScreen", 150)
  val click = openSpan(screen, "ui.action", "Crash with Nested Trace", 150)
  val work = openSpan(click, "demo.work", "leave nested spans unfinished", 150)
  val crash = openSpan(work, "demo.crash", "throw uncaught exception", 150)
  crash.setData("left_open", true)

  throw RuntimeException("Uncaught Exception with unfinished nested Sentry spans.")
}
image

https://sentry-sdks.sentry.io/issues/trace/80b71be4838c4130bd4954c2fe66ce13/?groupId=7506217367&node=span-8f25cb757ac94603&pageEnd=2026-05-26T17%3A50%3A07.509&pageStart=2026-05-25T17%3A50%3A07.509&project=5428559&query=is%3Aunresolved&referrer=issue-stream&source=issue_details&timestamp=1779774604.853

Comment thread src/sentry_tracing.c
Comment thread src/sentry_tracing.c
Comment thread src/sentry_tracing.c
Comment thread src/sentry_tracing.c
Comment thread src/sentry_tracing.c Outdated
Comment thread src/sentry_tracing.c
Comment thread src/sentry_core.c
@jpnurmi jpnurmi force-pushed the jpnurmi/fix/trace-finish branch from 1764758 to 570ed57 Compare April 24, 2026 15:35
@jpnurmi jpnurmi changed the title WIP: fix(tracing): finish active trace on crash fix(tracing): finish active trace on crash Apr 25, 2026
@jpnurmi jpnurmi force-pushed the jpnurmi/fix/trace-finish branch from 570ed57 to 264a69b Compare April 27, 2026 08:13
Comment thread src/sentry_tracing.c
Comment thread src/sentry_tracing.c
@jpnurmi jpnurmi changed the title fix(tracing): finish active trace on crash WIP: fix(tracing): finish active trace on crash Apr 27, 2026
Comment thread src/backends/sentry_backend_breakpad.cpp Outdated
Comment thread src/backends/sentry_backend_breakpad.cpp Outdated
Comment thread src/sentry_envelope.c Outdated
Comment thread src/backends/native/sentry_crash_daemon.c
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 4 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 6a2e71b. Configure here.

Comment thread src/backends/sentry_backend_inproc.c
Comment thread src/backends/sentry_backend_native.c
jpnurmi and others added 12 commits May 20, 2026 16:16
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The prior crash-path trace finish only closed `scope->span` (the deepest
active span) and its root transaction. Intermediate spans between them —
e.g. Qt event dispatch handlers nested above a crashing slot — were never
finished, never added to `tx.spans`, and ended up orphaning the crash
event at the trace root in Sentry's UI.

Track every span on its root transaction under `children_mutex` at
`sentry__span_new`, deregister on normal finish, and drain the list in
leaf-first order inside `sentry__trace_finish`. Matches sentry-cocoa's
`SentryTracer` `_children` + `finishForCrash` and sentry-java's
`SentryTracer.forceFinish`.

Preserve `scope->span` / `scope->transaction_object` across the drain so
the subsequent crash event inherits the full trace context from scope
(mirrors cocoa's `finishTracer:shouldCleanUp:NO`). Without this, the
crash event fell through to the stale propagation context and Sentry
could not nest it under the active span chain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trim WHAT-narration comments in sentry__trace_finish and its tests, keeping
the sentry-cocoa alignment anchor and the non-obvious "detach to skip
remove-scan" rationale. Warn on the tracking-list allocation failure so a
silent crash-finalize gap is audible. Use the typedef name for the forward
declaration in sentry_sampling_context.h for consistency with the rest of
the codebase.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Split sentry__trace_finish into save_active_trace / restore_active_trace
(as a pair around the scope mutation) and finish_children for the atomic
children swap + finish loop, so the top-level function reads as a short
narrative. No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sentry_transaction_finish_ts consumes its argument, so the ref that
save_active_trace took for active_tx is released by the finish call.
restore_active_trace was decref'ing it again. Harmless in crash context
where the process exits, but a real refcount underflow otherwise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sentry_span_finish_ts consumes its argument (decref at sentry_core.c:1607
on success, :1611 on fail), so the explicit sentry__span_decref after the
finish call released the children-list ref a second time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The transaction's `children` list held an incref'd reference on each live
span, while each span holds a ref on its transaction via `span->transaction`.
Any code path that decref'd a span without going through `sentry_span_finish_ts`
(e.g. the `span_data` unit test using the low-level `sentry__span_new` /
`sentry__span_decref` API) left both sides stuck at refcount 1, leaking both.

Make `tx->children` weak: `sentry__span_new` no longer increfs on add, and
`sentry__span_decref` unlists the span from the children list on its final
drop. `sentry__transaction_remove_child` correspondingly no longer decrefs.
The drain path in `sentry__trace_finish` continues to work — the spans on
the swapped-out list are alive via their other refs (scope / saved_span /
user var), and `sentry_span_finish_ts` consumes one of those refs as the
caller's, exactly as in the non-crash flow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After the weak-children refactor, `finish_children` handed each span to
`sentry_span_finish_ts` without a caller ref. `finish_ts` consumes one
ref on its argument, so when the drained span had only the user's own ref
outstanding, finish_ts would drop it to zero and free the span — leaving
the user's pointer dangling.

Incref inside the drain loop so finish_ts consumes "our" ref, leaving user
refs intact. Also update the `trace_finish` unit test to release the refs
it held on `tx`/`child`/`grand` (without the children-list strong ref,
unfinished spans now properly leak if the user forgets to decref).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SENTRY_WITH_SCOPE_MUT calls sentry__scope_flush_unlock on exit, which
invokes the backend's flush_scope_func — file I/O in both crashpad and
native backends. sentry__trace_finish runs from signal-handler context
in the inproc/breakpad/native crash handlers, so the flush is unsafe
there. Use the NO_FLUSH variant the codebase provides for this exact
situation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Call restore_active_trace before the early return so any refs taken by
save_active_trace are released even if active_tx is NULL. No-op under
the current invariant (saved_span->transaction is always non-NULL), but
defensive if the invariant ever changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jpnurmi jpnurmi force-pushed the jpnurmi/fix/trace-finish branch from 6a2e71b to 336c007 Compare May 21, 2026 05:15
@jpnurmi jpnurmi changed the title WIP: fix(tracing): finish active trace on crash fix(tracing): finish active trace on crash May 22, 2026
Comment thread src/backends/sentry_backend_inproc.c
@jpnurmi jpnurmi requested review from JoshuaMoelans and mujacica May 26, 2026 06:36
Copy link
Copy Markdown
Member

@JoshuaMoelans JoshuaMoelans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, LGTM! Small open Q on the trace-closing path now calling some SENTRY_WITH_SCOPE_MUT scope-flushes.

Also, maybe we want to add a note to our before_send_transaction that it too is potentially ran in a signal handler (similar to the ones we have for before_send/on_crash)

 * This function may be invoked inside a signal handler and must be safe for
 * that purpose, see https://man7.org/linux/man-pages/man7/signal-safety.7.html.
 * On Windows, it may be called from inside a `UnhandledExceptionFilter`, see
 * the documentation on SEH (structured exception handling) for more information
 * https://docs.microsoft.com/en-us/windows/win32/debug/structured-exception-handling

Comment thread src/sentry_tracing.c
sentry_span_t *child = children[i];
sentry__span_incref(child);
sentry_span_set_status(child, status);
sentry_span_finish_ts(child, end_ts);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: is it a problem that, during a crash, in here we do SENTRY_WITH_SCOPE_MUT (scope) {... which calls our backend scope flush function?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't sound optimal to flush during crashes, but both out-of-process backends that have flush functions are guarded against flushing during a crash:

if (!data->event_path || data->crashed.load(std::memory_order_relaxed)
|| !data->scope_flush.compare_exchange_strong(
expected, true, std::memory_order_acquire)) {
return;
}

if (sentry__atomic_fetch(&state->crashed)) {
return;
}

@jpnurmi jpnurmi merged commit 7d06d7d into master May 26, 2026
63 checks passed
@jpnurmi jpnurmi deleted the jpnurmi/fix/trace-finish branch May 26, 2026 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants