[wasm][interp] Implement implicit tail call optimization for interpreter#128318
[wasm][interp] Implement implicit tail call optimization for interpreter#128318radekdoulik wants to merge 4 commits into
Conversation
Add implicit tail call (ITC) detection to the CoreCLR interpreter, matching the JIT's approach in fgMorphPotentialTailCall. When a non-virtual, non-calli, non-newobj call immediately precedes a ret, and safety checks pass, the call is promoted to a tail call. Safety checks: - IL pre-scan in CreateBasicBlocks sets m_hasAddressExposedLocals (ldloca, ldarga) and m_hasLocalloc (localloc) before code generation, ensuring the flags are set regardless of IL instruction ordering. - CallHasByrefIntoLocalFrame rejects arguments whose types are inherently unsafe (CORINFO_TYPE_PTR, CORINFO_TYPE_REFANY, byref-like value classes) regardless of provenance. - canTailCall receives isExplicitTailCall=false for implicit tail calls, letting the VM apply stricter validation (StackCrawlMark, NoInlining). Also re-enables the F# mutual_recursion tail call test on Browser/WASM that was blocked by interpreter stack overflow before this optimization. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds implicit tail call (ITC) detection to the CoreCLR interpreter: when a non-virtual / non-calli / non-newobj call is immediately followed by ret and passes safety checks, it is automatically promoted to a tail call. Mirrors the JIT's fgMorphPotentialTailCall approach so deeply recursive code (notably the F# mutual_recursion test on Browser/WASM) no longer overflows the interpreter stack.
Changes:
- Pre-scan IL in
CreateBasicBlocksto setm_hasAddressExposedLocals(ldloca/ldarga, both wide and short forms) andm_hasLocalloc; also set the address-exposed flag inEmitLdLocA. - Add
CallHasByrefIntoLocalFrameto reject calls whose argument types are inherently unsafe (CORINFO_TYPE_PTR,CORINFO_TYPE_REFANY,CORINFO_FLG_BYREF_LIKEvalue classes), and wire implicit-tailcall detection intoEmitCall, passingisExplicitTailCall=falsetocanTailCallso the VM applies stricter validation. - Re-enable
mutual_recursion.fson Browser by removing theActiveIssueattribute for issue #127437.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/coreclr/interpreter/compiler.h | Declares the new method-level flags and CallHasByrefIntoLocalFrame helper. |
| src/coreclr/interpreter/compiler.cpp | Implements IL pre-scan flag setting, the new byref-arg helper, and the implicit-tailcall detection logic in EmitCall. |
| src/tests/JIT/Directed/tailcall/mutual_recursion.fs | Removes the Browser ActiveIssue since the interpreter no longer overflows on this test. |
Remove it is redundant with theCallHasByrefIntoLocalFrame m_hasAddressExposedLocals and m_hasLocalloc checks. A pointer into the current frame can only originate from ldloca/ldarga (caught by m_hasAddressExposedLocals) or localloc (caught by m_hasLocalloc). Byref-like types, PTR, and REFANY passed as arguments cannot hold frame-interior pointers unless one of those opcodes was used. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
F# should use explicit tail. prefix when it depends on tail calls for correctness. Why is it not the case here? |
|
Tagging subscribers to this area: @JulieLeeMSFT, @BrzVlad, @janvorli, @kg |
| #endif | ||
| } | ||
|
|
||
| // Implicit tail call: convert call+ret into a tail call when safe. |
There was a problem hiding this comment.
Regular JIT does the implicit tail call optimization only when optimizations are enabled. Does this need the same?
There was a problem hiding this comment.
I have made it driven by CORJIT_FLAGS::CORJIT_FLAG_DEBUG_CODE and CORJIT_FLAGS::CORJIT_FLAG_MIN_OPT.
In JIT the code is also wrapped with FEATURE_TAILCALL_OPT ifdef's. Do you want to make it also compile time option? What would be good place to move it outside JIT, src/coreclr/inc/switches.h?
There was a problem hiding this comment.
In JIT the code is also wrapped with FEATURE_TAILCALL_OPT ifdef's.
This optimization is not supported on some architectures in the JIT. The define exist to control whether this optimization is supported on given architecture.
The optimization in the interpreter is architecture neutral, so there is no need for this define in the interpreter.
- Disable ITC when CORJIT_FLAG_DEBUG_CODE or CORJIT_FLAG_MIN_OPT is set, matching the JIT's behavior of only performing implicit tail calls when optimizations are enabled. - Remove redundant m_hasAddressExposedLocals assignment from EmitLdLocA (already set by the pre-scan in CreateBasicBlocks). - Add comments explaining why the pre-scan sets flags before GenerateCode: GenerateCode processes IL sequentially, so a call;ret before ldloca/localloc would miss the flag without the pre-scan. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Good question. The F# uses explicit tail only for Release configuration. With the optimization flags in place the ITC will not be used in Debug though. I disabled the test in Debug configuration. |
Huh. Is that a bug? |
Why do we need this fix at all then? If you run Debug configuration, F# won't produce tail. prefix, the interpreter optimization will be disabled so the scenario won't work. If you run Release configuration, F# will produce tail. prefix, the interpreter will respect the tail. prefix so the scenario will work even without this fix. |
Add implicit tail call (ITC) detection to the CoreCLR interpreter, matching the JIT's approach in fgMorphPotentialTailCall. When a non-virtual, non-calli, non-newobj call immediately precedes a ret, and safety checks pass, the call is promoted to a tail call.
Safety checks:
Also re-enables the F# mutual_recursion tail call test on Browser/WASM that was blocked by interpreter stack overflow before this optimization.