Skip to content

[wasm][interp] Implement implicit tail call optimization for interpreter#128318

Draft
radekdoulik wants to merge 4 commits into
dotnet:mainfrom
radekdoulik:interp-implicit-tailcall
Draft

[wasm][interp] Implement implicit tail call optimization for interpreter#128318
radekdoulik wants to merge 4 commits into
dotnet:mainfrom
radekdoulik:interp-implicit-tailcall

Conversation

@radekdoulik
Copy link
Copy Markdown
Member

Add implicit tail call (ITC) detection to the CoreCLR interpreter, matching the JIT's approach in fgMorphPotentialTailCall. When a non-virtual, non-calli, non-newobj call immediately precedes a ret, and safety checks pass, the call is promoted to a tail call.

Safety checks:

  • IL pre-scan in CreateBasicBlocks sets m_hasAddressExposedLocals (ldloca, ldarga) and m_hasLocalloc (localloc) before code generation, ensuring the flags are set regardless of IL instruction ordering.
  • CallHasByrefIntoLocalFrame rejects arguments whose types are inherently unsafe (CORINFO_TYPE_PTR, CORINFO_TYPE_REFANY, byref-like value classes) regardless of provenance.
  • canTailCall receives isExplicitTailCall=false for implicit tail calls, letting the VM apply stricter validation (StackCrawlMark, NoInlining).

Also re-enables the F# mutual_recursion tail call test on Browser/WASM that was blocked by interpreter stack overflow before this optimization.

Add implicit tail call (ITC) detection to the CoreCLR interpreter, matching
the JIT's approach in fgMorphPotentialTailCall. When a non-virtual, non-calli,
non-newobj call immediately precedes a ret, and safety checks pass, the call
is promoted to a tail call.

Safety checks:
- IL pre-scan in CreateBasicBlocks sets m_hasAddressExposedLocals (ldloca,
  ldarga) and m_hasLocalloc (localloc) before code generation, ensuring the
  flags are set regardless of IL instruction ordering.
- CallHasByrefIntoLocalFrame rejects arguments whose types are inherently
  unsafe (CORINFO_TYPE_PTR, CORINFO_TYPE_REFANY, byref-like value classes)
  regardless of provenance.
- canTailCall receives isExplicitTailCall=false for implicit tail calls,
  letting the VM apply stricter validation (StackCrawlMark, NoInlining).

Also re-enables the F# mutual_recursion tail call test on Browser/WASM
that was blocked by interpreter stack overflow before this optimization.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@radekdoulik radekdoulik added this to the Future milestone May 18, 2026
Copilot AI review requested due to automatic review settings May 18, 2026 11:14
@radekdoulik radekdoulik added arch-wasm WebAssembly architecture area-Interop-coreclr labels May 18, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds implicit tail call (ITC) detection to the CoreCLR interpreter: when a non-virtual / non-calli / non-newobj call is immediately followed by ret and passes safety checks, it is automatically promoted to a tail call. Mirrors the JIT's fgMorphPotentialTailCall approach so deeply recursive code (notably the F# mutual_recursion test on Browser/WASM) no longer overflows the interpreter stack.

Changes:

  • Pre-scan IL in CreateBasicBlocks to set m_hasAddressExposedLocals (ldloca/ldarga, both wide and short forms) and m_hasLocalloc; also set the address-exposed flag in EmitLdLocA.
  • Add CallHasByrefIntoLocalFrame to reject calls whose argument types are inherently unsafe (CORINFO_TYPE_PTR, CORINFO_TYPE_REFANY, CORINFO_FLG_BYREF_LIKE value classes), and wire implicit-tailcall detection into EmitCall, passing isExplicitTailCall=false to canTailCall so the VM applies stricter validation.
  • Re-enable mutual_recursion.fs on Browser by removing the ActiveIssue attribute for issue #127437.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/coreclr/interpreter/compiler.h Declares the new method-level flags and CallHasByrefIntoLocalFrame helper.
src/coreclr/interpreter/compiler.cpp Implements IL pre-scan flag setting, the new byref-arg helper, and the implicit-tailcall detection logic in EmitCall.
src/tests/JIT/Directed/tailcall/mutual_recursion.fs Removes the Browser ActiveIssue since the interpreter no longer overflows on this test.

Comment thread src/coreclr/interpreter/compiler.cpp Outdated
Comment thread src/coreclr/interpreter/compiler.cpp Outdated
Remove  it is redundant with theCallHasByrefIntoLocalFrame
m_hasAddressExposedLocals and m_hasLocalloc checks. A pointer into the
current frame can only originate from ldloca/ldarga (caught by
m_hasAddressExposedLocals) or localloc (caught by m_hasLocalloc).
Byref-like types, PTR, and REFANY passed as arguments cannot hold
frame-interior pointers unless one of those opcodes was used.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jkotas
Copy link
Copy Markdown
Member

jkotas commented May 18, 2026

Also re-enables the F# mutual_recursion tail call test on Browser/WASM that was blocked by interpreter stack overflow before this optimization.

F# should use explicit tail. prefix when it depends on tail calls for correctness. Why is it not the case here?

@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @BrzVlad, @janvorli, @kg
See info in area-owners.md if you want to be subscribed.

Comment thread src/coreclr/interpreter/compiler.cpp
Comment thread src/coreclr/interpreter/compiler.cpp
#endif
}

// Implicit tail call: convert call+ret into a tail call when safe.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regular JIT does the implicit tail call optimization only when optimizations are enabled. Does this need the same?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made it driven by CORJIT_FLAGS::CORJIT_FLAG_DEBUG_CODE and CORJIT_FLAGS::CORJIT_FLAG_MIN_OPT.

In JIT the code is also wrapped with FEATURE_TAILCALL_OPT ifdef's. Do you want to make it also compile time option? What would be good place to move it outside JIT, src/coreclr/inc/switches.h?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In JIT the code is also wrapped with FEATURE_TAILCALL_OPT ifdef's.

This optimization is not supported on some architectures in the JIT. The define exist to control whether this optimization is supported on given architecture.

The optimization in the interpreter is architecture neutral, so there is no need for this define in the interpreter.

- Disable ITC when CORJIT_FLAG_DEBUG_CODE or CORJIT_FLAG_MIN_OPT is set,
  matching the JIT's behavior of only performing implicit tail calls when
  optimizations are enabled.
- Remove redundant m_hasAddressExposedLocals assignment from EmitLdLocA
  (already set by the pre-scan in CreateBasicBlocks).
- Add comments explaining why the pre-scan sets flags before GenerateCode:
  GenerateCode processes IL sequentially, so a call;ret before ldloca/localloc
  would miss the flag without the pre-scan.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 18, 2026 20:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@radekdoulik
Copy link
Copy Markdown
Member Author

Also re-enables the F# mutual_recursion tail call test on Browser/WASM that was blocked by interpreter stack overflow before this optimization.

F# should use explicit tail. prefix when it depends on tail calls for correctness. Why is it not the case here?

Good question. The F# uses explicit tail only for Release configuration.

With the optimization flags in place the ITC will not be used in Debug though. I disabled the test in Debug configuration.

@jkotas
Copy link
Copy Markdown
Member

jkotas commented May 19, 2026

The F# uses explicit tail only for Release configuration.

Huh. Is that a bug?

@jkotas
Copy link
Copy Markdown
Member

jkotas commented May 19, 2026

With the optimization flags in place the ITC will not be used in Debug though. I disabled the test in Debug configuration.

Why do we need this fix at all then?

If you run Debug configuration, F# won't produce tail. prefix, the interpreter optimization will be disabled so the scenario won't work.

If you run Release configuration, F# will produce tail. prefix, the interpreter will respect the tail. prefix so the scenario will work even without this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants