Improve flame graph performance.#4820
Conversation
0245674 to
b0fd162
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #4820 +/- ##
==========================================
+ Coverage 88.28% 88.30% +0.02%
==========================================
Files 301 301
Lines 26921 26969 +48
Branches 7276 7285 +9
==========================================
+ Hits 23767 23815 +48
Misses 2936 2936
Partials 218 218 ☔ View full report in Codecov by Sentry. |
b0fd162 to
739d104
Compare
|
Hmm, Codecov is right, I should write some tests for the new bisection functions. |
julienw
left a comment
There was a problem hiding this comment.
Thanks, this looks like a good improvement !
r=me with some tests for the new bisect functions (+ some nits, see below)
| return low; | ||
| } | ||
|
|
||
| export function bisectionLeftByKey<T>( |
There was a problem hiding this comment.
I'd apppreciate a small comment explaining the difference to the other functions (essentially, toKey provides the value that will be compared to x)
bisectionLeft could be implemented by providing the identity function as toKey, but how much penalty we'd get from this change? I believe that for such a function we want the highest throughput so it makes sense to keep both separate, but then it's also good to mention this decision in the comment.
There was a problem hiding this comment.
Added the following comment:
/**
* Like bisectionRight, but accepts a "key" function which maps an element of
* the array to a number "key". The array must be sorted by that key.
* The looked-up element `x` will be compared to the keys.
*/I agree that we want high throughput on this function so I opted to duplicate the code. I haven't compared the two solutions though.
There was a problem hiding this comment.
I've also added a big comment at the top of the file which gives better guidance on when to use left vs right:
/**
* These functions are used on sorted arrays.
*
* You commonly use them in one of two cases:
*
* 1. When you want to insert a new element into a sorted array, at a position
* such that the array remains in sorted order. Or
* 2. When you have a sorted array of interval start positions, and you want
* to find out which interval includes a certain number.
*
* For case 1, you can use either bisectionRight or bisectionLeft. The difference
* only matters if you care about the positioning of two equal elements. For
* example:
* bisectionRight([1, 2, 3], 2) === 2
* bisectionLeft([1, 2, 3], 2) === 1
* i.e. the returned index is either to the right or to the left of the equal
* element. If no exactly matching element is present in the array, both functions
* return the same value, i.e. the index of the first element that's larger than
* the passed in element (which is the index at which that element would need to
* be inserted).
*
* ```js
* const insertionIndexR = bisectionRight(array, x);
* assert(array[insertionIndexR] > x);
* assert(array[insertionIndexR - 1] <= x);
*
* const insertionIndexL = bisectionLeft(array, x);
* assert(array[insertionIndexL] >= x);
* assert(array[insertionIndexL - 1] < x);
* ```
*
* For case 2, you'll have to use bisectionRight, and subtract 1 from the return
* value. For example, if you have the half-open intervals 2..4, 4..7, 7..Infinity,
* then your start position array will be [2, 4, 7] and bisectionRight() - 1 will
* be the index of the last interval whose start position is <= the checked element.
* bisectionRight([2, 4, 7], 1) - 1 === -1
* bisectionRight([2, 4, 7], 2) - 1 === 0
* bisectionRight([2, 4, 7], 3) - 1 === 0
* bisectionRight([2, 4, 7], 4) - 1 === 1
* bisectionRight([2, 4, 7], 5) - 1 === 1
* bisectionRight([2, 4, 7], 6) - 1 === 1
* bisectionRight([2, 4, 7], 7) - 1 === 2
* bisectionRight([2, 4, 7], 20) - 1 === 2
*
* Example code:
*
* ```js
* const intervalStarts = [2, 4, 7];
* const insertionIndex = bisectionRight(intervalStarts, x);
* if (insertionIndex === 0) {
* // x is before the first interval.
* return null;
* }
*
* const intervalIndex = insertionIndex - 1;
* assert(x >= intervalStarts[intervalIndex]);
*
* // If there can be gaps between your intervals, you also need to check the
* // interval end:
* if (x >= intervalEnds[intervalIndex]) {
* // x isn't actually inside this interval! It's in the gap between
* // intervalIndex and intervalIndex+1.
* return null;
* }
*
* // Now we know that x is inside the interval.
* assert(x >= intervalStarts[intervalIndex] && x < intervalEnds[intervalIndex]);
* return intervalIndex;
* ```
*/| // | ||
| // For row k, callNodesAtDepth[k] is partitioned as follows: | ||
| // | ||
| // - callNodesAtDepth[k][0..pendingRangeStartAtDepth[k]] are "finished" |
There was a problem hiding this comment.
Do you want pendingRangeStartAtDepth[k] - 1 as the last index? Or use ) as a excluding range end?
(I think using the first option is more explicit)
There was a problem hiding this comment.
I wish there were a commonly accepted unambiguous notation! Here I was using the Rust slice syntax, where 0..2 does not include 2 (but 0..=2 does).
There was a problem hiding this comment.
This now says:
// For row d, flameGraphRows[d] is partitioned as follows (a..b is the half-open
// range which includes a but excludes b):
//
// - flameGraphRows[d][0..pendingRangeStartAtDepth[d]] are "finished"
// - flameGraphRows[d][pendingRangeStartAtDepth[d]..] are "pending"
//| // Check if the current row contains any other pending call nodes. | ||
| while (indexInCandidateRow === candidateRow.length) { | ||
| // There are no more pending nodes in the current row. |
There was a problem hiding this comment.
The comments could be improved to explain the loop: indeed reading the comment we'd except a simple if condition. At the first read I was wondering if the 2 while here should be somehow reversed ("changing the row"-while outside, "moving inside the row"-while inside).
I'm still not sure :-) and maybe we could find a way to reduce the amount of while loops, but maybe not
| // We're completely done. | ||
| break outer; | ||
| } | ||
| // Go up a level and continue the search there. |
There was a problem hiding this comment.
do we need to "finish" this level? We do it for the first one above, but what about if we move up several times? It may not be a problem if we never come back at this depth though (it's not clear to me).
There was a problem hiding this comment.
If we get to this line, we have already finished this level. We get here if indexInCandidateRow === candidateRow.length, which means that pendingRangeStartAtDepth[candidateDepth] === flameGraphRows[candidateDepth].length, which means that all nodes in at the current depth are already finished.
| pendingRangeStartAtDepth[candidateDepth] = indexInCandidateRow + 1; | ||
|
|
||
| // Advance to candidateNode's first child. | ||
| currentCallNode = candidateNode + 1; // "currentCallNode = firstChild[candidateNode];" |
There was a problem hiding this comment.
This is the first child because of how the call node is sorted, right? (just checking)
There was a problem hiding this comment.
correct, I'll tweak the comment
| ); | ||
| // Now currentCallNode and all its siblings have been added to the row, and | ||
| // they are ordered correctly. Find a queued up node in this row which has | ||
| // children, and descend into it. |
There was a problem hiding this comment.
Sometimes we "descend" and sometimes we "go up", it might be useful to be more explicit about the various cases. Especially that the only goal for the rest of this loop is to find the next value for currentCallNode. That would make it easier to follow. Maybe it would be possible to extract this part in a function to make it easier to follow (but maybe not).
There was a problem hiding this comment.
Would it help if I replaced "descend" with "go down" or "go deeper"?
I'll try to make the comments a bit clearer.
I'm not sure if anything can be easily pulled out into a separate function; the algorithm would be easier to follow if it used recursion, but then it would use recursion... and I'm a bit afraid of the stack usage for profiles with deep stacks, and of the function call overhead.
739d104 to
1de3e07
Compare
Before: https://share.firefox.dev/47ydcmp After: https://share.firefox.dev/40Y48og Fixes firefox-devtools#844. The old code was saving time by sorting siblings only for the nodes that were displayed based on the current (preview) range selection. This made it cheaper to compute the flame graph for a small range, but it meant that we had to re-do the sort every time the selection changed. Now we do the sorting once, based on the entire call node table. This is expensive but it is a one-time cost (and it's still cheaper than if you were looking at the entire call tree with the old implementation). Then we cache the "ordered rows" and don't have to sort again on each range selection change.
1de3e07 to
30ae592
Compare
|
Ugh, my fixes didn't make it into this PR. I pushed but didn't double-check that the push went through (there was a lint failure because of a last minute comment change). New PR coming up... |
Main branch | Deploy preview
Before: https://share.firefox.dev/47ydcmp
After: https://share.firefox.dev/40Y48og
The old code was saving time by sorting siblings only for the nodes that were displayed based on the current (preview) range selection. This made it cheaper to compute the flame graph for a small range, but it meant that we had to re-do the sort every time the selection changed.
Now we do the sorting once, based on the entire call node table. This is expensive but it is a one-time cost (and it's still cheaper than if you were looking at the entire call tree with the old implementation). Then we cache the "ordered rows" and don't have to sort again on each range selection change.
In the future, this could be further optimized by doing the snapping earlier so that we only compute flameGraphTiming entries for visible boxes. Or we could even combine getFlameGraphTiming into the flame graph rendering - we just need to find a solution for the uses of FlameGraphTiming inside FlameGraph.js, i.e. for hover hit testing + keyboard selection.