Skip to content

JIT: improve throughput of the RLCSE greedy heuristic#98906

Merged
AndyAyersMS merged 1 commit intodotnet:mainfrom
AndyAyersMS:ImproveRLCSE_TP
Feb 26, 2024
Merged

JIT: improve throughput of the RLCSE greedy heuristic#98906
AndyAyersMS merged 1 commit intodotnet:mainfrom
AndyAyersMS:ImproveRLCSE_TP

Conversation

@AndyAyersMS
Copy link
Copy Markdown
Member

@AndyAyersMS AndyAyersMS commented Feb 25, 2024

Profiling showed that GetFeatures was a major factor in throughput. For the most part the features of CSE candidates don't change as we perform CSEs, so build in some logic to avoid recomputing the feature set unless there is some evidence features have changed.

To avoid having to remove already performed candidates from the candidate vector we now tag them as m_performed so they get ignored during subsequent processing, and discarded if we ever recompute features.

This should cut the TP impact roughly in half, the remaining part seems to largely be from doing more CSEs (which we hope will show some perf benefit).

I also took advantage of #98434 so in the symbol table listing the cse temps now show which CSE candidate inspired them.

Contributes to #92915.

Profiling showed that `GetFeatures` was a major factor in throughput. For the
most part the features of CSE candidates don't change as we perform CSEs, so
build in some logic to avoid recomputing the feature set unless there is some
evidence features have changed.

To avoid having to remove already performed candidates from the candidate vector
we now tag them them as `m_performed`l  these get ignored during subsequent processing,
and discarded if we ever recompute features.

This should cut the TP impact roughly in half, the remaining part seems to
largely be from doing more CSEs (which we hope will show some perf benefit).

Contributes to dotnet#92915.
@ghost ghost assigned AndyAyersMS Feb 25, 2024
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 25, 2024
@ghost
Copy link
Copy Markdown

ghost commented Feb 25, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Profiling showed that GetFeatures was a major factor in throughput. For the most part the features of CSE candidates don't change as we perform CSEs, so build in some logic to avoid recomputing the feature set unless there is some evidence features have changed.

To avoid having to remove already performed candidates from the candidate vector we now tag them them as m_performedl these get ignored during subsequent processing, and discarded if we ever recompute features.

This should cut the TP impact roughly in half, the remaining part seems to largely be from doing more CSEs (which we hope will show some perf benefit).

Contributes to #92915.

Author: AndyAyersMS
Assignees: AndyAyersMS
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Copy Markdown
Member Author

@EgorBo PTAL
cc @dotnet/jit-contrib

Note the heuristic is disabled here so this will look like a no-diff change. I will cherry-pick this into #98776 where the new heuristic is enabled.

@AndyAyersMS
Copy link
Copy Markdown
Member Author

One note on "unchanging" features. I realized while working on this that "is live across call" is a volatile feature (say we CSE a helper call) but we have no easy way to discover or recompute this.

Likewise with "is LSRA live across call" though that does get recomputed if we recompute for other reasons.

@EgorBo
Copy link
Copy Markdown
Member

EgorBo commented Feb 26, 2024

say we CSE a helper call) but we have no easy way to discover or recompute this.

Also, some helper calls have custom calling conventions (e.g. write barriers), so presumably CSE candidates don't suffer from living accross those.

@AndyAyersMS
Copy link
Copy Markdown
Member Author

say we CSE a helper call) but we have no easy way to discover or recompute this.

Also, some helper calls have custom calling conventions (e.g. write barriers), so presumably CSE candidates don't suffer from living accross those.

I'll have to check, but I don't think CSE analysis takes potential write barrier calls (or any other late-introduced call) into account.

@AndyAyersMS
Copy link
Copy Markdown
Member Author

The "lsra live across" is potentially expensive as it could walk most of the flow graph per candidate. I could make this more efficient by doing just one walk but that would require revising the flow of candidate costing. Will try and get a handle on the cost first.

Pareto frontier data suggests that there is a pretty hard tradeoff between perf score and code size (and hence I'm guessing TP) so not clear how much better things can get here.

@AndyAyersMS AndyAyersMS merged commit d0255d2 into dotnet:main Feb 26, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Mar 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants