Skip to content

Improve flame graph performance.#4820

Merged
mstange merged 2 commits into
firefox-devtools:mainfrom
mstange:faster-flamegraph
Nov 30, 2023
Merged

Improve flame graph performance.#4820
mstange merged 2 commits into
firefox-devtools:mainfrom
mstange:faster-flamegraph

Conversation

@mstange

@mstange mstange commented Nov 28, 2023

Copy link
Copy Markdown
Contributor

Main branch | Deploy preview

Before: https://share.firefox.dev/47ydcmp
After: https://share.firefox.dev/40Y48og

The old code was saving time by sorting siblings only for the nodes that were displayed based on the current (preview) range selection. This made it cheaper to compute the flame graph for a small range, but it meant that we had to re-do the sort every time the selection changed.

Now we do the sorting once, based on the entire call node table. This is expensive but it is a one-time cost (and it's still cheaper than if you were looking at the entire call tree with the old implementation). Then we cache the "ordered rows" and don't have to sort again on each range selection change.

In the future, this could be further optimized by doing the snapping earlier so that we only compute flameGraphTiming entries for visible boxes. Or we could even combine getFlameGraphTiming into the flame graph rendering - we just need to find a solution for the uses of FlameGraphTiming inside FlameGraph.js, i.e. for hover hit testing + keyboard selection.

@mstange mstange requested a review from julienw November 28, 2023 21:29
@mstange mstange self-assigned this Nov 28, 2023
@codecov

codecov Bot commented Nov 28, 2023

Copy link
Copy Markdown

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (7e2bb74) 88.28% compared to head (30ae592) 88.30%.

Files Patch % Lines
src/profile-logic/flame-graph.js 98.70% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4820      +/-   ##
==========================================
+ Coverage   88.28%   88.30%   +0.02%     
==========================================
  Files         301      301              
  Lines       26921    26969      +48     
  Branches     7276     7285       +9     
==========================================
+ Hits        23767    23815      +48     
  Misses       2936     2936              
  Partials      218      218              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mstange

mstange commented Nov 29, 2023

Copy link
Copy Markdown
Contributor Author

Hmm, Codecov is right, I should write some tests for the new bisection functions.

@julienw julienw left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks like a good improvement !

r=me with some tests for the new bisect functions (+ some nits, see below)

Comment thread src/profile-logic/call-tree.js
Comment thread src/utils/bisect.js Outdated
return low;
}

export function bisectionLeftByKey<T>(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd apppreciate a small comment explaining the difference to the other functions (essentially, toKey provides the value that will be compared to x)

bisectionLeft could be implemented by providing the identity function as toKey, but how much penalty we'd get from this change? I believe that for such a function we want the highest throughput so it makes sense to keep both separate, but then it's also good to mention this decision in the comment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the following comment:

/**
 * Like bisectionRight, but accepts a "key" function which maps an element of
 * the array to a number "key". The array must be sorted by that key.
 * The looked-up element `x` will be compared to the keys.
 */

I agree that we want high throughput on this function so I opted to duplicate the code. I haven't compared the two solutions though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also added a big comment at the top of the file which gives better guidance on when to use left vs right:

/**
 * These functions are used on sorted arrays.
 *
 * You commonly use them in one of two cases:
 *
 *  1. When you want to insert a new element into a sorted array, at a position
 *     such that the array remains in sorted order. Or
 *  2. When you have a sorted array of interval start positions, and you want
 *     to find out which interval includes a certain number.
 *
 * For case 1, you can use either bisectionRight or bisectionLeft. The difference
 * only matters if you care about the positioning of two equal elements. For
 * example:
 *   bisectionRight([1, 2, 3], 2) === 2
 *   bisectionLeft([1, 2, 3], 2) === 1
 * i.e. the returned index is either to the right or to the left of the equal
 * element. If no exactly matching element is present in the array, both functions
 * return the same value, i.e. the index of the first element that's larger than
 * the passed in element (which is the index at which that element would need to
 * be inserted).
 *
 * ```js
 * const insertionIndexR = bisectionRight(array, x);
 * assert(array[insertionIndexR] > x);
 * assert(array[insertionIndexR - 1] <= x);
 *
 * const insertionIndexL = bisectionLeft(array, x);
 * assert(array[insertionIndexL] >= x);
 * assert(array[insertionIndexL - 1] < x);
 * ```
 *
 * For case 2, you'll have to use bisectionRight, and subtract 1 from the return
 * value. For example, if you have the half-open intervals 2..4, 4..7, 7..Infinity,
 * then your start position array will be [2, 4, 7] and bisectionRight() - 1 will
 * be the index of the last interval whose start position is <= the checked element.
 *   bisectionRight([2, 4, 7], 1) - 1 === -1
 *   bisectionRight([2, 4, 7], 2) - 1 === 0
 *   bisectionRight([2, 4, 7], 3) - 1 === 0
 *   bisectionRight([2, 4, 7], 4) - 1 === 1
 *   bisectionRight([2, 4, 7], 5) - 1 === 1
 *   bisectionRight([2, 4, 7], 6) - 1 === 1
 *   bisectionRight([2, 4, 7], 7) - 1 === 2
 *   bisectionRight([2, 4, 7], 20) - 1 === 2
 *
 * Example code:
 *
 * ```js
 * const intervalStarts = [2, 4, 7];
 * const insertionIndex = bisectionRight(intervalStarts, x);
 * if (insertionIndex === 0) {
 *   // x is before the first interval.
 *   return null;
 * }
 *
 * const intervalIndex = insertionIndex - 1;
 * assert(x >= intervalStarts[intervalIndex]);
 *
 * // If there can be gaps between your intervals, you also need to check the
 * // interval end:
 * if (x >= intervalEnds[intervalIndex]) {
 *   // x isn't actually inside this interval! It's in the gap between
 *   // intervalIndex and intervalIndex+1.
 *   return null;
 * }
 *
 * // Now we know that x is inside the interval.
 * assert(x >= intervalStarts[intervalIndex] && x < intervalEnds[intervalIndex]);
 * return intervalIndex;
 * ```
 */

Comment thread src/utils/bisect.js Outdated
Comment thread src/profile-logic/flame-graph.js
Comment thread src/profile-logic/flame-graph.js Outdated
Comment thread src/profile-logic/flame-graph.js Outdated
//
// For row k, callNodesAtDepth[k] is partitioned as follows:
//
// - callNodesAtDepth[k][0..pendingRangeStartAtDepth[k]] are "finished"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want pendingRangeStartAtDepth[k] - 1 as the last index? Or use ) as a excluding range end?
(I think using the first option is more explicit)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish there were a commonly accepted unambiguous notation! Here I was using the Rust slice syntax, where 0..2 does not include 2 (but 0..=2 does).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now says:

  // For row d, flameGraphRows[d] is partitioned as follows (a..b is the half-open
  // range which includes a but excludes b):
  //
  //  - flameGraphRows[d][0..pendingRangeStartAtDepth[d]] are "finished"
  //  - flameGraphRows[d][pendingRangeStartAtDepth[d]..] are "pending"
  //

Comment on lines +175 to +178
// Check if the current row contains any other pending call nodes.
while (indexInCandidateRow === candidateRow.length) {
// There are no more pending nodes in the current row.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments could be improved to explain the loop: indeed reading the comment we'd except a simple if condition. At the first read I was wondering if the 2 while here should be somehow reversed ("changing the row"-while outside, "moving inside the row"-while inside).
I'm still not sure :-) and maybe we could find a way to reduce the amount of while loops, but maybe not

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a solution where I call a function called _findNextNodeForProcessing which has the order of the while loops that you suggested: c185d4b
But it increased the runtime of computeFlameGraphRows from 122ms to 146ms. Before / after

// We're completely done.
break outer;
}
// Go up a level and continue the search there.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to "finish" this level? We do it for the first one above, but what about if we move up several times? It may not be a problem if we never come back at this depth though (it's not clear to me).

@mstange mstange Nov 30, 2023

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get to this line, we have already finished this level. We get here if indexInCandidateRow === candidateRow.length, which means that pendingRangeStartAtDepth[candidateDepth] === flameGraphRows[candidateDepth].length, which means that all nodes in at the current depth are already finished.

pendingRangeStartAtDepth[candidateDepth] = indexInCandidateRow + 1;

// Advance to candidateNode's first child.
currentCallNode = candidateNode + 1; // "currentCallNode = firstChild[candidateNode];"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first child because of how the call node is sorted, right? (just checking)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, I'll tweak the comment

);
// Now currentCallNode and all its siblings have been added to the row, and
// they are ordered correctly. Find a queued up node in this row which has
// children, and descend into it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes we "descend" and sometimes we "go up", it might be useful to be more explicit about the various cases. Especially that the only goal for the rest of this loop is to find the next value for currentCallNode. That would make it easier to follow. Maybe it would be possible to extract this part in a function to make it easier to follow (but maybe not).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it help if I replaced "descend" with "go down" or "go deeper"?

I'll try to make the comments a bit clearer.

I'm not sure if anything can be easily pulled out into a separate function; the algorithm would be easier to follow if it used recursion, but then it would use recursion... and I'm a bit afraid of the stack usage for profiles with deep stacks, and of the function call overhead.

Before: https://share.firefox.dev/47ydcmp
After: https://share.firefox.dev/40Y48og

Fixes firefox-devtools#844.

The old code was saving time by sorting siblings only for the nodes that
were displayed based on the current (preview) range selection. This made
it cheaper to compute the flame graph for a small range, but it meant
that we had to re-do the sort every time the selection changed.

Now we do the sorting once, based on the entire call node table. This is
expensive but it is a one-time cost (and it's still cheaper than if you
were looking at the entire call tree with the old implementation).
Then we cache the "ordered rows" and don't have to sort again on each
range selection change.
@mstange mstange merged commit 469a598 into firefox-devtools:main Nov 30, 2023
@mstange

mstange commented Nov 30, 2023

Copy link
Copy Markdown
Contributor Author

Ugh, my fixes didn't make it into this PR. I pushed but didn't double-check that the push went through (there was a lint failure because of a last minute comment change). New PR coming up...

mstange added a commit to mstange/perf.html that referenced this pull request Nov 30, 2023
mstange added a commit that referenced this pull request Dec 1, 2023
@julienw julienw mentioned this pull request Dec 5, 2023
@mstange mstange added the perf Issues where the profiler itself is slow. label Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

perf Issues where the profiler itself is slow.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants