Skip to content

DHAT: Optimize import time for large profiles#4640

Merged
julienw merged 3 commits into
firefox-devtools:mainfrom
flowln:optimize_dhat_import
Jun 6, 2023
Merged

DHAT: Optimize import time for large profiles#4640
julienw merged 3 commits into
firefox-devtools:mainfrom
flowln:optimize_dhat_import

Conversation

@flowln

@flowln flowln commented May 28, 2023

Copy link
Copy Markdown
Contributor

This commit changes the way in which the stackTable tree is constructed in DHAT's import algorithm.

Previously, the algorithm would look a bit like this:

    For each PP:
        For each frame in the PP, from top to bottom:
            Loop over all existing stackTable entries, until we find the
            one we want, or go through all of them.

            If we found an entry, we are good.
            If not, create a new entry.
        [...]
    [...]

This approach, while correct, isn't optimal, since we would potentially loop over a wide range of stackTable entries that couldn't possibly be the one we were looking for.

With this commit, the algorithm becomes more or less like this:

    For each PP:
        For each frame in the PP, from top to bottom:
            Loop over the stackTable entries whose prefix is the same as
            the one we're looking for.

            If we found an entry, we are good.
            If not, create a new entry, also updating the list of
            entries with our prefix.
        [...]
    [...]

With this, we avoid looking over a very big number of entries whose prefix are different from ours. The effect is that of switching from a linear search on a list to following pointers on a tree structure.

The list of entries with the same prefix is implemented as a JS Map of prefixes to lists of indexes into the stackTable.

With this change, big DHAT profiles take a lot less time to load (e.g. a 20MB profile would take ~2min before, now it takes ~2.5s :D).

This is the profile I tested with: dhat.out.3106 🙂

Sorry if I messed something up, I'm not particularly acquainted with JS (I even had a fight with the type checker, despite we having the same name! :P )

This commit changes the way in which the stackTable tree is constructed
in DHAT's import algorithm.

Previously, the algorithm would look a bit like this:
    For each PP:
        For each frame in the PP, from top to bottom:
            Loop over all existing stackTable entries, until we find the
            one we want, or go through all of them.

            If we found an entry, we are good.
            If not, create a new entry.
        [...]
    [...]

This approach, while correct, isn't optimal, since we would potentially
loop over a wide range of stackTable entries that couldn't possibly be
the one we were looking for.

With this commit, the algorithm becomes more or less like this:
    For each PP:
        For each frame in the PP, from top to bottom:
            Loop over the stackTable entries whose prefix is the same as
            the one we're looking for.

            If we found an entry, we are good.
            If not, create a new entry, also updating the list of
            entries with our prefix.
        [...]
    [...]

With this, we avoid looking over a very big number of entries whose
prefix are different from ours. The effect is that of switching from a
linear search on a list to following pointers on a tree structure.

The list of entries with the same prefix is implemented as a JS Map of
prefixes to lists of indexes into the stackTable.

With this change, big DHAT profiles take a lot less time to load (e.g. a
20MB profile would take ~2min before, now it takes ~2.5s :D).

Signed-off-by: flow <flowlnlnln@gmail.com>
@codecov

codecov Bot commented May 28, 2023

Copy link
Copy Markdown

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (c3f2c3f) 88.61% compared to head (bed5b29) 88.62%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4640   +/-   ##
=======================================
  Coverage   88.61%   88.62%           
=======================================
  Files         294      294           
  Lines       26095    26101    +6     
  Branches     7035     7036    +1     
=======================================
+ Hits        23125    23131    +6     
  Misses       2765     2765           
  Partials      205      205           
Impacted Files Coverage Δ
src/profile-logic/import/dhat.js 98.31% <100.00%> (+0.08%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@julienw julienw left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks like a very good addition indeed, thanks for the dramatic speedup! I see we're mostly getting rid of the stack table loop by using this tree instead. It makes sense for this case where we're scanning the stack table repeatidly, even though this duplicates the data in these 2 structures.

I left a few comments to make the code a bit easier to follow. That shouldn't change how the algorithm works, rather just put together the things that work together.
Tell me what you think!

Comment thread src/profile-logic/import/dhat.js Outdated
break;
if (candidateStackTables) {
// Start searching for a stack index.
stackIndex = stackTable.length;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think I'd convert this to let stackIndex = null right before the if. Then in the if below we can check for if (stackIndex === null) { stackIndex = stackTable.length; ... }. I believe this would make this code more explicit. What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While implementing your other suggestion (#4640 (comment)), I found out that it will only work if we use stackIndex = stackTable.length every time, not just when entering this loop.

This is due to the old value of stackIndex, from a previous iteration of the loop, not being equal to stackTable.length, while candidateStackTables is undefined (so, we are in this prefix for the first time). So, it will neither enter this loop, nor create a new stackTable entry, which causes issues.

Because of that, I think it's best to simply move this stackIndex = stackTable.length line up, and add a comment explaining that that is a fallback when we don't enter this loop, or we enter it but don't find a match.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm I feel like my suggestion here would still work but what you did works too, and the comment makes it clear enough, so I don't mind.

Comment on lines -258 to +263
let stackIndex = -1;
let stackIndex = 0;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this value isn't used outside of the loop now, so it can be moved below. Also look at the other comment about the same stackIndex variable below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is still used one time after the loop, to tell the ending stackIndex of the PP entry to the allocationsTable!

Comment thread src/profile-logic/import/dhat.js Outdated
// Since we just created a stack index, the next frames necessarily don't have an existing stack index.
candidateStackTables = [];
} else {
candidateStackTables = postfix.get(stackIndex);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can run candidateStackTables = postfix.get(prefix) at the start of the loop (right before if (candidateStackTables)), and get rid of this line as well as the previous line and the line let candidateStackTables = postfix.get(null); before the loop.

Comment thread src/profile-logic/import/dhat.js Outdated
stackTable.category.push(otherSubCategory);
stackTable.prefix.push(prefix);

const candidateList = postfix.get(prefix);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will already be candidateStackTables, with the changes I suggest, right? So we'll be able to reuse it directly.
(with your current patch, it's not always the case, especially when using the empty array line 310)

@mstange

mstange commented May 30, 2023

Copy link
Copy Markdown
Contributor

It looks like a hashmap with string keys of the type ${prefix}-${frame} is even faster, and also simpler.

@mstange

mstange commented May 30, 2023

Copy link
Copy Markdown
Contributor

It looks like a hashmap with string keys of the type ${prefix}-${frame} is even faster, and also simpler.

Ah, this approach would change behavior because we would no longer be de-duplicating stacks for frames of the same func. So I think it's fine to go ahead with the approach in this PR. I can create a follow-up PR later to implement the hashmap approach. What you have now is a huge improvement and worth shipping.

It seems like there are other problems with the importer on the provided profile - the regular expression doesn't match due to missing column numbers (for the functions with file+line information) and due to missing filenames (for the functions from libraries without debug info).

This makes some misc. changes to make it a bit easier to understand the
code added in the previous commit. This doesn't (i hope) change any behaviours!

Signed-off-by: flow <flowlnlnln@gmail.com>
@flowln flowln force-pushed the optimize_dhat_import branch from 6c42a8c to bed5b29 Compare May 30, 2023 21:16
@julienw julienw self-requested a review May 31, 2023 09:29

@julienw julienw left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, this looks good to me now!

@julienw julienw enabled auto-merge (squash) June 6, 2023 16:01
@julienw julienw merged commit bf0526d into firefox-devtools:main Jun 6, 2023
@canova canova mentioned this pull request Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants