Skip to content

chasm(callback): implement backoff and invocation executor#8499

Merged
spkane31 merged 41 commits into
mainfrom
spk/callback-executors
Oct 30, 2025
Merged

chasm(callback): implement backoff and invocation executor#8499
spkane31 merged 41 commits into
mainfrom
spk/callback-executors

Conversation

@spkane31
Copy link
Copy Markdown
Contributor

What changed?

Adding an InvocationTaskExecutor and BackoffTaskExecutor to chasm/lib/callback

Why?

Second step of migrating callback from HSM -> CHASM

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Potential risks

None, this is not integrated.

@spkane31 spkane31 requested a review from bergundy October 20, 2025 15:39
@spkane31 spkane31 marked this pull request as ready for review October 20, 2025 16:17
@spkane31 spkane31 requested review from a team as code owners October 20, 2025 16:17
@spkane31 spkane31 requested a review from pdoerner October 20, 2025 19:26
Copy link
Copy Markdown
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use chasm/statemachine.go and copy over all of the state transitions from the HSM implementation.

Comment thread chasm/lib/callback/executor.go Outdated
_, err := chasm.ReadComponent(
ctx,
invokerRef,
func(c *Callback, ctx chasm.Context, _ any) (struct{}, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return what you need here instead of cloning the entire proto IMHO. I typically would rather not cloning components for use outside of the chasm contexts.

Comment thread chasm/lib/callback/executor.go Outdated
task *callbackspb.InvocationTask,
) error {
var ns *namespace.Namespace
// var invoker *Invoker
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover comment.

Comment thread chasm/lib/callback/executor.go Outdated
) error {
var ns *namespace.Namespace
// var invoker *Invoker
var callback *Callback
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be the return type from ReadComponent.

Comment thread chasm/lib/callback/executor.go Outdated
_ chasm.TaskAttributes,
_ *callbackspb.InvocationTask,
) (bool, error) {
return callback.Status == callbackspb.CALLBACK_STATUS_SCHEDULED, nil
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also validate the attempt matches, copy the attempt to the invocation task when scheduled. This would prevent duplicate tasks from being considered valid in some cases.

Comment thread chasm/lib/callback/executor.go Outdated
case *callbackspb.Callback_Nexus:
// Parse URL to extract scheme and host, matching HSM's behavior
// from statemachine.go:86-90
u, err := url.Parse(variant.Nexus.Url)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaptersix we should follow up and change this to use the destination in the token in case the URL is temporal://system.

map<string, string> header = 2;
}

message InvokerState {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Comment thread chasm/lib/callback/component.go Outdated

// Fields from HSM's nexusInvocation struct (nexus_invocation.go:35-40)
// These hold the invocation context needed for the HTTP request
completion nexusrpc.OperationCompletion
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't hold this here, create a separate struct for that as we did in the HSM version.

Comment thread chasm/lib/callback/component.go Outdated
// - HSM invocationResultOK -> CHASM CALLBACK_STATUS_SUCCEEDED
// - HSM invocationResultRetry -> CHASM CALLBACK_STATUS_BACKING_OFF
// - HSM invocationResultFail -> CHASM CALLBACK_STATUS_FAILED
func (c *Callback) invoke(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be a method on the component. I know we did this pattern in scheduler but I do not want us to reference components in general outside of the chasm context, it's too error prone and hard to reason about.

Comment thread chasm/lib/callback/executor.go Outdated
)
defer cancel()

result := callback.invoke(callCtx, ns, e, taskAttributes, task)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing the chasm variant that @lina-temporal recently added.

Comment on lines +36 to +40

string workflow_id = 10;
string run_id = 11;

string namespace_id = 12;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Copy link
Copy Markdown
Contributor Author

@spkane31 spkane31 Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Namespace is used to get request timeout from dynamic config on a per-namespace basis

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callback is not guaranteed to be attached to a workflow.
Wait for #8533 to be merged and use ExecutionKey() instead.

@spkane31 spkane31 requested a review from bergundy October 23, 2025 22:59
Comment thread chasm/lib/callback/executors.go Outdated
WrapError(result invocationResult, err error) error
}

func (e InvocationTaskExecutor) executeInvocationTask(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're going to want to call this Execute but you'll figure that out when you register it in the library.

Comment thread chasm/lib/callback/executors.go Outdated
return invokable.WrapError(result, saveErr)
}

func (e InvocationTaskExecutor) loadInvocationArgs(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would write this as a function that you pass to chasm.ReadComponent to save the closure.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... now that you've refactored this function to work within a lock, you can just make it a method of the Callback struct. Same thing for saveResult. You don't actually need e here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes for an awkward (imo):

	invokable, err := chasm.ReadComponent(
		ctx,
		ref,
		(*Callback).loadInvocationArgs,
		ctx,
	)

any issues with that?

Comment thread chasm/lib/callback/executors.go Outdated
)
}

func (e InvocationTaskExecutor) saveResult(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, write this as a function that you pass to UpdateComponent.

Comment thread chasm/lib/callback/executors.go Outdated
_ *callbackspb.BackoffTask,
) (bool, error) {
// Validate that the callback is in BACKING_OFF state
return callback.Status == callbackspb.CALLBACK_STATUS_BACKING_OFF, nil
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also check that the attempt matches.

Comment thread chasm/lib/callback/component.go Outdated
*callbackspb.CallbackState

// Interface to retrieve Nexus operation completion data
CanGetNexusCompletion chasm.Field[CanGetNexusCompletion]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this CompletionSource or something more informative?

Comment thread chasm/lib/callback/executors_test.go Outdated
},
expectedMetricOutcome: "status:200",
setupCallback: func(cb *Callback) {
cb.Status = callbackspb.CALLBACK_STATUS_SCHEDULED
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well set the status up for all test cases.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are missing a bunch of test cases that exist in the HSM implementation.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not seeing all of the tests ported here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HSM CHASM
config_test.go config_test.go (copied directly)
TestValidTransitions TestValidTransitions (copied)
TestCompareState I don't think this test applies, is there a chasm.CompareState equivalence?
TestProcessInvocationTaskNexus_Outcomes TestExecuteInvocationTaskNexus_Outcomes (renamed)
TestProcessInvocationTaskHsm_Outcomes Not relevant right?
TestProcessBackoffTask TestProcessBackoffTask
TestProcessInvocationTaskChasm_Outcomes TestExecuteInvocationTaskChasm_Outcomes (added now)

Comment on lines +36 to +40

string workflow_id = 10;
string run_id = 11;

string namespace_id = 12;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callback is not guaranteed to be attached to a workflow.
Wait for #8533 to be merged and use ExecutionKey() instead.

Comment thread chasm/lib/callback/proto/v1/tasks.proto Outdated
message BackoffTask {} No newline at end of file
message BackoffTask {
// deadline is the time at which the backoff period ends.
google.protobuf.Timestamp deadline = 1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need this, it's part of the task attributes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And please add attempt for verification.

Comment thread chasm/lib/callback/proto/v1/tasks.proto Outdated
// Will have other meanings as more callback use cases are added.
string url = 1;
// The destination for callbacks. Can be a URL for nexus callbacks or temporal:// for internal callbacks.
string destination = 1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need this, it's part of the task attributes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to this are triggering our proto breaking change CI check, do we have a way to force skip that check?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can disable in buf.yaml.

@spkane31 spkane31 requested a review from bergundy October 27, 2025 22:17
Comment thread chasm/lib/callback/component.go Outdated
Comment on lines +16 to +17
//
// This is the CHASM port of HSM's nexusInvocation struct from nexus_invocation.go:25-32.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//
// This is the CHASM port of HSM's nexusInvocation struct from nexus_invocation.go:25-32.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment was incorrect.

Comment thread chasm/lib/callback/executors.go Outdated
return nil, err
}

completion, err := target.GetNexusCompletion(ctx, component.RequestId)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see that the context object is used here, we should be able to use the chasm context here, that's going to be tech debt we need to track. CC @yycptt.

Comment thread chasm/lib/callback/executors.go Outdated
return invokable.WrapError(result, saveErr)
}

func (e InvocationTaskExecutor) loadInvocationArgs(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... now that you've refactored this function to work within a lock, you can just make it a method of the Callback struct. Same thing for saveResult. You don't actually need e here.

Comment thread chasm/lib/callback/executors.go Outdated
ctx chasm.MutableContext,
result invocationResult,
) (struct{}, error) {
switch result.(type) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go through the statemachine transitions here.

Comment thread chasm/lib/callback/executors.go Outdated

// Convert the CHASM task to the internal BackoffTask type
// Note: BackoffTask proto is empty, deadline comes from NextAttemptScheduleTime in callback
backoffTask := &callbackspb.BackoffTask{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Comment thread chasm/lib/callback/executors.go Outdated
Comment on lines +233 to +239
executor := InvocationTaskExecutor{
InvocationTaskExecutorOptions: InvocationTaskExecutorOptions{
Config: e.Config,
MetricsHandler: e.MetricsHandler,
Logger: e.Logger,
},
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why create another executor?

Comment thread chasm/lib/callback/executors.go Outdated
return callback.Status == callbackspb.CALLBACK_STATUS_BACKING_OFF && callback.Attempt == task.Attempt, nil
}

func (e InvocationTaskExecutor) executeBackoffTask(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be BackoffTaskExecutor.Execute.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can disable in buf.yaml.

@spkane31 spkane31 requested a review from bergundy October 29, 2025 15:30
Comment thread chasm/lib/buf.yaml
Comment on lines 9 to +10
- chasm/lib/scheduler/proto/v1/message.proto
- chasm/lib/callback/proto/v1/tasks.proto
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a TODO to remove these after merging this PR. I'd be fine to leave the scheduler one though and let @lina-temporal handle that.

Comment thread chasm/lib/callback/component.go Outdated
func (c *Callback) saveResult(
ctx chasm.MutableContext,
result invocationResult,
) (struct{}, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking this PR but we need to add a chasm.NoValue type that is *struct{} and just return nil from methods that don't have a return value.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this type FTR.

Comment thread chasm/lib/callback/executors_test.go Outdated
}

// Test saveResult transitions
func TestSaveResult(t *testing.T) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant IMHO, just test the entire executor.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Comment thread chasm/lib/callback/executors_test.go Outdated
}

// Test loadInvocationArgs with ComponentRef
func TestLoadInvocationArgs(t *testing.T) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking: This is redundant IMHO.

If you really want to test load and save (which I think isn't required) move the tests to component_test.go.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not seeing all of the tests ported here.

Comment thread chasm/lib/callback/metrics.go Outdated
Comment on lines +6 to +10
"chasm_callback_outbound_requests",
metrics.WithDescription("The number of CHASM outbound callback requests made by the history service."),
)
var RequestLatencyHistogram = metrics.NewTimerDef(
"chasm_callback_outbound_latency",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"chasm_callback_outbound_requests",
metrics.WithDescription("The number of CHASM outbound callback requests made by the history service."),
)
var RequestLatencyHistogram = metrics.NewTimerDef(
"chasm_callback_outbound_latency",
"callback_outbound_requests",
metrics.WithDescription("The number of outbound callback requests made by the history service."),
)
var RequestLatencyHistogram = metrics.NewTimerDef(
"callback_outbound_latency",

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would conflict with the existing HSM metric, do we want to emit these metrics with the same name or differentiate the two?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, it's a transparent change.

)
var RequestLatencyHistogram = metrics.NewTimerDef(
"chasm_callback_outbound_latency",
metrics.WithDescription("Latency histogram of CHASM outbound callback requests made by the history service."),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
metrics.WithDescription("Latency histogram of CHASM outbound callback requests made by the history service."),
metrics.WithDescription("Latency histogram of outbound callback requests made by the history service."),

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, did this to clarify the difference between the HSM and CHASM version.

@spkane31 spkane31 requested a review from bergundy October 30, 2025 15:18
Copy link
Copy Markdown
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with comments.

Comment thread chasm/lib/callback/component.go Outdated
func (c *Callback) saveResult(
ctx chasm.MutableContext,
result invocationResult,
) (struct{}, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this type FTR.

Comment thread chasm/lib/callback/metrics.go Outdated
Comment on lines +6 to +10
"chasm_callback_outbound_requests",
metrics.WithDescription("The number of CHASM outbound callback requests made by the history service."),
)
var RequestLatencyHistogram = metrics.NewTimerDef(
"chasm_callback_outbound_latency",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, it's a transparent change.

Comment thread chasm/lib/callback/statemachine.go Outdated
[]callbackspb.CallbackStatus{callbackspb.CALLBACK_STATUS_SCHEDULED},
callbackspb.CALLBACK_STATUS_FAILED,
func(cb *Callback, ctx chasm.MutableContext, event EventFailed) error {
func(cb *Callback, mctx chasm.MutableContext, event EventFailed) error {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename back to ctx? also below please.

@spkane31 spkane31 enabled auto-merge (squash) October 30, 2025 17:05
@spkane31 spkane31 merged commit 1b33d6a into main Oct 30, 2025
57 checks passed
@spkane31 spkane31 deleted the spk/callback-executors branch October 30, 2025 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants