Skip to content

Optimize call node table traversals#4807

Merged
mstange merged 15 commits into
firefox-devtools:mainfrom
mstange:sorted-callnodes
Nov 28, 2023
Merged

Optimize call node table traversals#4807
mstange merged 15 commits into
firefox-devtools:mainfrom
mstange:sorted-callnodes

Conversation

@mstange

@mstange mstange commented Nov 24, 2023

Copy link
Copy Markdown
Contributor

This patch set adds efficient ways to iterate over the children of a node, or to test whether a node is part of another node's subtree. This speeds up a bunch of different scenarios, and also allows some simplifications.

@mstange mstange requested a review from julienw November 24, 2023 04:40
@mstange mstange self-assigned this Nov 24, 2023
@codecov

codecov Bot commented Nov 24, 2023

Copy link
Copy Markdown

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (3e646c7) 88.26% compared to head (6915136) 88.28%.

❗ Current head 6915136 differs from pull request most recent head 868e6c6. Consider uploading reports for the commit 868e6c6 to get more accurate results

Files Patch % Lines
src/profile-logic/profile-data.js 99.01% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4807      +/-   ##
==========================================
+ Coverage   88.26%   88.28%   +0.01%     
==========================================
  Files         301      301              
  Lines       26912    26925      +13     
  Branches     7287     7277      -10     
==========================================
+ Hits        23755    23770      +15     
+ Misses       2939     2937       -2     
  Partials      218      218              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@julienw julienw left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty solid to me!

I've found just a few nits, otherwise this seems to work pretty well and is good perf improvement to our code. I don't know why we haven't done at least part of it already!

Thanks

Comment thread src/types/profile-derived.js Outdated
Comment on lines +128 to +136
const currentLastChild: Array<IndexIntoCallNodeTable> = [];
let currentLastRoot = -1;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add comments explaining what these 2 variables will hold?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    // An extra column that only gets used while the table is built up: For each
    // node A, currentLastChild[A] tracks the last currently-known child node of A.
    // It is updated whenever a new node is created; e.g. creating node B updates
    // currentLastChild[prefix[B]].
    // currentLastChild[A] is -1 while A has no children.
    const currentLastChild: Array<IndexIntoCallNodeTable> = [];

    // The last currently-known root node, i.e. the last known "child of -1".
    let currentLastRoot = -1;

Comment thread src/profile-logic/profile-data.js Outdated
Comment thread src/profile-logic/profile-data.js Outdated
Comment thread src/profile-logic/profile-data.js Outdated
Comment thread src/types/profile-derived.js Outdated
* - If a node A has no children, then A + 1 is its next sibling, or the closest
* next sibling of an ancestor node if A doesn't have a next sibling.
* - For every node A, there's a single "index range" which contains this node
* and all its descendants: [A, callNodeTable.nextAfterDescendants[A]).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL: interestingly, in France, open-bounded intervals use the reversed square bracket, in this case this would be: [ A, callNodeTable.nextAfterDescendants[A] [. I see both are accepted in international standards so this works for me :-)

@mstange mstange Nov 28, 2023

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, in Germany we were taught "both are valid, just pick one". Both syntaxes trigger my subconscious "mismatched / unclosed delimiter" alarms.
Oh, actually, another option would be the Rust range syntax, A..callNodeTable.nextAfterDescendants[A] ... but then you need to know that the end is not included so it might not solve the ambiguity

Comment thread src/types/profile-derived.js Outdated
// The last node has nextAfterDescendants set to callNodeTable.length.
//
// The nodes in the range range [A, nextAfterDescendants[A]) form A's subtree.
nextAfterDescendants: Uint32Array, // IndexIntoCallNodeTable -> IndexIntoCallNodeTable

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I wonder if a better name could be firstAfterDescendants. At least this is more immediately understandable to me.
But I'll let you choose.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be subtreeRangeEnd. I'm also not a huge fan of the name nextAfterDescendants.

* AfterSelected, 9
* AfterSelected, 5
* AfterSelected, 26
* ```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some form of this comment could still be useful in the new code?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the comment to mapCallNodeSelectedStatesToSamples:

/**
 * Given the call node for each sample and the call node selected states,
 * compute each sample's selected state.
 *
 * For samples that are not filtered out, the sample's selected state is based
 * on the relation of the sample's call node to the selected call node: Any call
 * nodes in the selected node's subtree are "selected"; all other nodes are
 * either "before" or "after" the selected subtree.
 *
 * Call node tables are ordered in depth-first traversal order, so we can
 * determine whether a node is before, inside or after a subtree simply by
 * comparing the call node index to the "selected index range". Example:
 *
 * ```
 * before, 0
 *   before, 1
 *     before, 2
 *   before, 3
 * before, 4
 *   before, 5
 *     before, 6
 *     before, 7
 *       before, 8
 *   before, 9
 *     before, 10
 *       before, 11
 *     before, 12
 *     selected, 13 <-- selected node
 *       selected, 14
 *         selected, 15
 *           selected, 16
 *         selected, 17
 *       selected, 18
 *         selected, 19
 *         selected, 20
 *     after, 21
 *       after, 22
 *     after, 23
 *   after, 24
 *     after, 25
 * after, 26
 *   after, 27
 * ```
 *
 * In this example, the selected node has index 13 and the "selected index range"
 * is the range from 13 to 21 (not including 21).
 */

Comment thread src/profile-logic/profile-data.js Outdated
* This order can be different than the order of the rows that are displayed in the
* call tree, because it does not take any sample information into account. This
* makes it independent of any range selection and cheaper to compute.
* Call nodes are ordered by their index.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can add a word about how the call node is sorted?

@mstange mstange Nov 28, 2023

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote the following comment:

/**
 * Return a function that can compare two samples' call nodes, and determine
 * which node is "before" the other.
 * We use the call node index for this order. In the call node table, call nodes
 * are ordered in depth-first traversal order, so we can just compare those
 * indexes.
 *
 * This order is used for the activity graph. The tree order comparator is used
 * specifically for hit testing, but we also compare call nodes in the same way
 * in mapCallNodeSelectedStatesToSamples, which is what gets used for determining
 * which areas of the graph to draw in with the selection highlight fill.
 *
 * "Ordered after" means "swims on top in the activity graph".
 *
 * The depth-first traversal order has the nice property that the nodes of a
 * subtree are located in a contiguous index range. This means that the
 * highlighted area for a selected subtree is contiguous in the graph.
 */

Comment on lines 267 to +271
const getTreeOrderComparatorInFilteredThread: Selector<
(IndexIntoSamplesTable, IndexIntoSamplesTable) => number,
> = createSelector(
threadSelectors.getFilteredThread,
getCallNodeInfo,
(thread, { callNodeTable, stackIndexToCallNodeIndex }) => {
const sampleIndexToCallNodeIndex =
ProfileData.getSampleIndexToCallNodeIndex(
thread.samples.stack,
stackIndexToCallNodeIndex
);
return ProfileData.getTreeOrderComparator(
callNodeTable,
sampleIndexToCallNodeIndex
);
}
getSampleIndexToCallNodeIndexForFilteredThread,
ProfileData.getTreeOrderComparator

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a naive question and not directly related to this patch: Does the comparator need to change for the filtered thread? Can't we keep the same one computed for the full thread all the time?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think it wouldn't work as well.

Here are the fundamental rules of the activity graph (this probably needs to be documented better):

  • Clicking a pixel in the activity graph should select a node which causes the clicked pixel to highlight.
  • Sibling nodes in the call tree should highlight different parts of the activity graph. In other words, selecting non-overlapping groups of samples should cause non-overlapping parts of the activity graph to highlight.
  • For each pixel column of the activity graph, the part of the pixel row that gets highlighted when a node is selected should be vertically contiguous.

Say the full thread has three sibling nodes A, B, and C, with B sorted in between A and C. Let's say in pixel column at x=100, hit testing the full thread activity graph picks A in the bottom third, B in the middle third and C in the upper third. Now you apply a transform which collapses A and C into the same node. If you select that collapsed node, which part of the pixel column is it going to highlight? It needs to highlight something contiguous. Let's say it highlights the lower 2/3rds of the column. Then you need hit testing to change, too, so that it doesn't pick the collapsed node when you click in the upper third (which previously selected C, or rather the samples for C, and these samples now belong to the collapsed node).

So yes I think the tree order comparator needs to change for the filtered thread, for the best experience.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I'm not using the transforms enough so I don't always think about these use cases.

I don't understand well the case you describe though... Why after collapsing A and C to a new node (let's say A'), the same locations than "previous A and C" are not selected when selecting A'? In other words, why should that always be vertically contiguous in this case?

@mstange mstange Nov 29, 2023

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be reasonable, too. But it would make drawing more complicated. At the moment, the three states "before selected" / "selected" / "after selected" are enough to draw the graph. If we supported vertically disjointed highlights, we'd need to think of a different way to represent the sample selected states, i.e. some way to say "between selected1 and selected2" or something like that.

Furthermore, I think it might look more noisy visually. For example, let's say you use the JS-only call tree, but your highlights are still in the same position as in the combined call tree, then selecting a JS function might give you "striped" highlight patterns, because the selected function or its ancestors might be running in different JIT tiers which have different call paths in the combined call tree. But this is speculation, it might be fun to try it out!

Oh, and if you focus on a function, samples get filtered out. Would you leave those filtered-out samples in the same spots as in the full thread, too? Then the "filtered out" pattern would need to be drawn in disjointed areas, i.e. both under and over the focused part of the graph. I think the graph could quickly become a jumble of the different patterns (non-highlighted, highlighted, filtered out) if we don't enforce a consistent ordering between them.

@mstange mstange enabled auto-merge November 28, 2023 05:22
@mstange mstange merged commit 5610835 into firefox-devtools:main Nov 28, 2023
@julienw

julienw commented Nov 28, 2023

Copy link
Copy Markdown
Contributor

I hadn't looked at this other PR Do sorting in one pass previously so I gave it a look now. This looks good but I think we could do a better job at naming the variables, I find it a bit difficult to follow. Maybe later after reading with new eyes.

@julienw julienw mentioned this pull request Dec 5, 2023
@mstange mstange added the perf Issues where the profiler itself is slow. label Dec 7, 2023
mstange added a commit to mstange/perf.html that referenced this pull request Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

perf Issues where the profiler itself is slow.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants