Improve flame graph performance. by mstange · Pull Request #4820 · firefox-devtools/profiler

mstange · 2023-11-28T21:29:08Z

Before: https://share.firefox.dev/47ydcmp
After: https://share.firefox.dev/40Y48og

The old code was saving time by sorting siblings only for the nodes that were displayed based on the current (preview) range selection. This made it cheaper to compute the flame graph for a small range, but it meant that we had to re-do the sort every time the selection changed.

Now we do the sorting once, based on the entire call node table. This is expensive but it is a one-time cost (and it's still cheaper than if you were looking at the entire call tree with the old implementation). Then we cache the "ordered rows" and don't have to sort again on each range selection change.

In the future, this could be further optimized by doing the snapping earlier so that we only compute flameGraphTiming entries for visible boxes. Or we could even combine getFlameGraphTiming into the flame graph rendering - we just need to find a solution for the uses of FlameGraphTiming inside FlameGraph.js, i.e. for hover hit testing + keyboard selection.

…onments.

codecov · 2023-11-28T21:38:36Z

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (7e2bb74) 88.28% compared to head (30ae592) 88.30%.

Files	Patch %	Lines
src/profile-logic/flame-graph.js	98.70%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4820      +/-   ##
==========================================
+ Coverage   88.28%   88.30%   +0.02%     
==========================================
  Files         301      301              
  Lines       26921    26969      +48     
  Branches     7276     7285       +9     
==========================================
+ Hits        23767    23815      +48     
  Misses       2936     2936              
  Partials      218      218

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mstange · 2023-11-29T03:09:01Z

Hmm, Codecov is right, I should write some tests for the new bisection functions.

julienw

Thanks, this looks like a good improvement !

r=me with some tests for the new bisect functions (+ some nits, see below)

julienw · 2023-11-29T15:37:17Z

  return low;
 }
+
+export function bisectionLeftByKey<T>(


I'd apppreciate a small comment explaining the difference to the other functions (essentially, toKey provides the value that will be compared to x)

bisectionLeft could be implemented by providing the identity function as toKey, but how much penalty we'd get from this change? I believe that for such a function we want the highest throughput so it makes sense to keep both separate, but then it's also good to mention this decision in the comment.

Added the following comment:

/** * Like bisectionRight, but accepts a "key" function which maps an element of * the array to a number "key". The array must be sorted by that key. * The looked-up element `x` will be compared to the keys. */

I agree that we want high throughput on this function so I opted to duplicate the code. I haven't compared the two solutions though.

I've also added a big comment at the top of the file which gives better guidance on when to use left vs right:

/** * These functions are used on sorted arrays. * * You commonly use them in one of two cases: * * 1. When you want to insert a new element into a sorted array, at a position * such that the array remains in sorted order. Or * 2. When you have a sorted array of interval start positions, and you want * to find out which interval includes a certain number. * * For case 1, you can use either bisectionRight or bisectionLeft. The difference * only matters if you care about the positioning of two equal elements. For * example: * bisectionRight([1, 2, 3], 2) === 2 * bisectionLeft([1, 2, 3], 2) === 1 * i.e. the returned index is either to the right or to the left of the equal * element. If no exactly matching element is present in the array, both functions * return the same value, i.e. the index of the first element that's larger than * the passed in element (which is the index at which that element would need to * be inserted). * * ```js * const insertionIndexR = bisectionRight(array, x); * assert(array[insertionIndexR] > x); * assert(array[insertionIndexR - 1] <= x); * * const insertionIndexL = bisectionLeft(array, x); * assert(array[insertionIndexL] >= x); * assert(array[insertionIndexL - 1] < x); * ``` * * For case 2, you'll have to use bisectionRight, and subtract 1 from the return * value. For example, if you have the half-open intervals 2..4, 4..7, 7..Infinity, * then your start position array will be [2, 4, 7] and bisectionRight() - 1 will * be the index of the last interval whose start position is <= the checked element. * bisectionRight([2, 4, 7], 1) - 1 === -1 * bisectionRight([2, 4, 7], 2) - 1 === 0 * bisectionRight([2, 4, 7], 3) - 1 === 0 * bisectionRight([2, 4, 7], 4) - 1 === 1 * bisectionRight([2, 4, 7], 5) - 1 === 1 * bisectionRight([2, 4, 7], 6) - 1 === 1 * bisectionRight([2, 4, 7], 7) - 1 === 2 * bisectionRight([2, 4, 7], 20) - 1 === 2 * * Example code: * * ```js * const intervalStarts = [2, 4, 7]; * const insertionIndex = bisectionRight(intervalStarts, x); * if (insertionIndex === 0) { * // x is before the first interval. * return null; * } * * const intervalIndex = insertionIndex - 1; * assert(x >= intervalStarts[intervalIndex]); * * // If there can be gaps between your intervals, you also need to check the * // interval end: * if (x >= intervalEnds[intervalIndex]) { * // x isn't actually inside this interval! It's in the gap between * // intervalIndex and intervalIndex+1. * return null; * } * * // Now we know that x is inside the interval. * assert(x >= intervalStarts[intervalIndex] && x < intervalEnds[intervalIndex]); * return intervalIndex; * ``` */

julienw · 2023-11-29T17:52:26Z

+  //
+  // For row k, callNodesAtDepth[k] is partitioned as follows:
+  //
+  //  - callNodesAtDepth[k][0..pendingRangeStartAtDepth[k]] are "finished"


Do you want pendingRangeStartAtDepth[k] - 1 as the last index? Or use ) as a excluding range end?
(I think using the first option is more explicit)

I wish there were a commonly accepted unambiguous notation! Here I was using the Rust slice syntax, where 0..2 does not include 2 (but 0..=2 does).

This now says:

// For row d, flameGraphRows[d] is partitioned as follows (a..b is the half-open // range which includes a but excludes b): // // - flameGraphRows[d][0..pendingRangeStartAtDepth[d]] are "finished" // - flameGraphRows[d][pendingRangeStartAtDepth[d]..] are "pending" //

julienw · 2023-11-29T18:05:45Z

+      // Check if the current row contains any other pending call nodes.
+      while (indexInCandidateRow === candidateRow.length) {
+        // There are no more pending nodes in the current row.


The comments could be improved to explain the loop: indeed reading the comment we'd except a simple if condition. At the first read I was wondering if the 2 while here should be somehow reversed ("changing the row"-while outside, "moving inside the row"-while inside).
I'm still not sure :-) and maybe we could find a way to reduce the amount of while loops, but maybe not

I tried a solution where I call a function called _findNextNodeForProcessing which has the order of the while loops that you suggested: c185d4b
But it increased the runtime of computeFlameGraphRows from 122ms to 146ms. Before / after

julienw · 2023-11-29T18:15:48Z

+          // We're completely done.
+          break outer;
+        }
+        // Go up a level and continue the search there.


do we need to "finish" this level? We do it for the first one above, but what about if we move up several times? It may not be a problem if we never come back at this depth though (it's not clear to me).

If we get to this line, we have already finished this level. We get here if indexInCandidateRow === candidateRow.length, which means that pendingRangeStartAtDepth[candidateDepth] === flameGraphRows[candidateDepth].length, which means that all nodes in at the current depth are already finished.

julienw · 2023-11-29T18:20:29Z

+    pendingRangeStartAtDepth[candidateDepth] = indexInCandidateRow + 1;
+
+    // Advance to candidateNode's first child.
+    currentCallNode = candidateNode + 1; // "currentCallNode = firstChild[candidateNode];"


This is the first child because of how the call node is sorted, right? (just checking)

correct, I'll tweak the comment

julienw · 2023-11-29T18:22:17Z

-    );
+    // Now currentCallNode and all its siblings have been added to the row, and
+    // they are ordered correctly. Find a queued up node in this row which has
+    // children, and descend into it.


Sometimes we "descend" and sometimes we "go up", it might be useful to be more explicit about the various cases. Especially that the only goal for the rest of this loop is to find the next value for currentCallNode. That would make it easier to follow. Maybe it would be possible to extract this part in a function to make it easier to follow (but maybe not).

Would it help if I replaced "descend" with "go down" or "go deeper"?

I'll try to make the comments a bit clearer.

I'm not sure if anything can be easily pulled out into a separate function; the algorithm would be easier to follow if it used recursion, but then it would use recursion... and I'm a bit afraid of the stack usage for profiles with deep stacks, and of the function call overhead.

Before: https://share.firefox.dev/47ydcmp After: https://share.firefox.dev/40Y48og Fixes firefox-devtools#844. The old code was saving time by sorting siblings only for the nodes that were displayed based on the current (preview) range selection. This made it cheaper to compute the flame graph for a small range, but it meant that we had to re-do the sort every time the selection changed. Now we do the sorting once, based on the entire call node table. This is expensive but it is a one-time cost (and it's still cheaper than if you were looking at the entire call tree with the old implementation). Then we cache the "ordered rows" and don't have to sort again on each range selection change.

mstange · 2023-11-30T21:53:39Z

Ugh, my fixes didn't make it into this PR. I pushed but didn't double-check that the push went through (there was a lint failure because of a last minute comment change). New PR coming up...

mstange requested a review from julienw November 28, 2023 21:29

mstange self-assigned this Nov 28, 2023

Work around Math.abs being slow in Firefox in React development envir…

0b537cb

…onments.

mstange force-pushed the faster-flamegraph branch from 0245674 to b0fd162 Compare November 28, 2023 21:31

mstange force-pushed the faster-flamegraph branch from b0fd162 to 739d104 Compare November 28, 2023 21:39

julienw approved these changes Nov 29, 2023

View reviewed changes

mstange force-pushed the faster-flamegraph branch from 739d104 to 1de3e07 Compare November 30, 2023 03:09

mstange force-pushed the faster-flamegraph branch from 1de3e07 to 30ae592 Compare November 30, 2023 03:15

mstange merged commit 469a598 into firefox-devtools:main Nov 30, 2023

mstange added a commit to mstange/perf.html that referenced this pull request Nov 30, 2023

Apply missed fixes for PR firefox-devtools#4820.

cb21da5

mstange mentioned this pull request Nov 30, 2023

Apply missed fixes for PR #4820. #4824

Merged

mstange added a commit that referenced this pull request Dec 1, 2023

Apply missed fixes for PR #4820. (Merge PR #4824)

625135b

julienw mentioned this pull request Dec 5, 2023

Deploy Dec 5, 2023 #4832

Merged

mstange added the perf Issues where the profiler itself is slow. label Dec 7, 2023

Conversation

mstange commented Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mstange commented Nov 29, 2023

Uh oh!

julienw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mstange Nov 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mstange commented Nov 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mstange commented Nov 28, 2023 •

edited

Loading

codecov Bot commented Nov 28, 2023 •

edited

Loading

mstange Nov 30, 2023 •

edited

Loading