DHAT: Optimize import time for large profiles#4640
Conversation
This commit changes the way in which the stackTable tree is constructed
in DHAT's import algorithm.
Previously, the algorithm would look a bit like this:
For each PP:
For each frame in the PP, from top to bottom:
Loop over all existing stackTable entries, until we find the
one we want, or go through all of them.
If we found an entry, we are good.
If not, create a new entry.
[...]
[...]
This approach, while correct, isn't optimal, since we would potentially
loop over a wide range of stackTable entries that couldn't possibly be
the one we were looking for.
With this commit, the algorithm becomes more or less like this:
For each PP:
For each frame in the PP, from top to bottom:
Loop over the stackTable entries whose prefix is the same as
the one we're looking for.
If we found an entry, we are good.
If not, create a new entry, also updating the list of
entries with our prefix.
[...]
[...]
With this, we avoid looking over a very big number of entries whose
prefix are different from ours. The effect is that of switching from a
linear search on a list to following pointers on a tree structure.
The list of entries with the same prefix is implemented as a JS Map of
prefixes to lists of indexes into the stackTable.
With this change, big DHAT profiles take a lot less time to load (e.g. a
20MB profile would take ~2min before, now it takes ~2.5s :D).
Signed-off-by: flow <flowlnlnln@gmail.com>
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #4640 +/- ##
=======================================
Coverage 88.61% 88.62%
=======================================
Files 294 294
Lines 26095 26101 +6
Branches 7035 7036 +1
=======================================
+ Hits 23125 23131 +6
Misses 2765 2765
Partials 205 205
☔ View full report in Codecov by Sentry. |
julienw
left a comment
There was a problem hiding this comment.
Thanks, this looks like a very good addition indeed, thanks for the dramatic speedup! I see we're mostly getting rid of the stack table loop by using this tree instead. It makes sense for this case where we're scanning the stack table repeatidly, even though this duplicates the data in these 2 structures.
I left a few comments to make the code a bit easier to follow. That shouldn't change how the algorithm works, rather just put together the things that work together.
Tell me what you think!
| break; | ||
| if (candidateStackTables) { | ||
| // Start searching for a stack index. | ||
| stackIndex = stackTable.length; |
There was a problem hiding this comment.
nit: I think I'd convert this to let stackIndex = null right before the if. Then in the if below we can check for if (stackIndex === null) { stackIndex = stackTable.length; ... }. I believe this would make this code more explicit. What do you think?
There was a problem hiding this comment.
While implementing your other suggestion (#4640 (comment)), I found out that it will only work if we use stackIndex = stackTable.length every time, not just when entering this loop.
This is due to the old value of stackIndex, from a previous iteration of the loop, not being equal to stackTable.length, while candidateStackTables is undefined (so, we are in this prefix for the first time). So, it will neither enter this loop, nor create a new stackTable entry, which causes issues.
Because of that, I think it's best to simply move this stackIndex = stackTable.length line up, and add a comment explaining that that is a fallback when we don't enter this loop, or we enter it but don't find a match.
There was a problem hiding this comment.
Mmm I feel like my suggestion here would still work but what you did works too, and the comment makes it clear enough, so I don't mind.
| let stackIndex = -1; | ||
| let stackIndex = 0; |
There was a problem hiding this comment.
I think this value isn't used outside of the loop now, so it can be moved below. Also look at the other comment about the same stackIndex variable below.
There was a problem hiding this comment.
This variable is still used one time after the loop, to tell the ending stackIndex of the PP entry to the allocationsTable!
| // Since we just created a stack index, the next frames necessarily don't have an existing stack index. | ||
| candidateStackTables = []; | ||
| } else { | ||
| candidateStackTables = postfix.get(stackIndex); |
There was a problem hiding this comment.
I think you can run candidateStackTables = postfix.get(prefix) at the start of the loop (right before if (candidateStackTables)), and get rid of this line as well as the previous line and the line let candidateStackTables = postfix.get(null); before the loop.
| stackTable.category.push(otherSubCategory); | ||
| stackTable.prefix.push(prefix); | ||
|
|
||
| const candidateList = postfix.get(prefix); |
There was a problem hiding this comment.
I believe this will already be candidateStackTables, with the changes I suggest, right? So we'll be able to reuse it directly.
(with your current patch, it's not always the case, especially when using the empty array line 310)
|
It looks like a hashmap with string keys of the type |
Ah, this approach would change behavior because we would no longer be de-duplicating stacks for frames of the same func. So I think it's fine to go ahead with the approach in this PR. I can create a follow-up PR later to implement the hashmap approach. What you have now is a huge improvement and worth shipping. It seems like there are other problems with the importer on the provided profile - the regular expression doesn't match due to missing column numbers (for the functions with file+line information) and due to missing filenames (for the functions from libraries without debug info). |
This makes some misc. changes to make it a bit easier to understand the code added in the previous commit. This doesn't (i hope) change any behaviours! Signed-off-by: flow <flowlnlnln@gmail.com>
6c42a8c to
bed5b29
Compare
julienw
left a comment
There was a problem hiding this comment.
Thanks for the updates, this looks good to me now!
This commit changes the way in which the stackTable tree is constructed in DHAT's import algorithm.
Previously, the algorithm would look a bit like this:
This approach, while correct, isn't optimal, since we would potentially loop over a wide range of stackTable entries that couldn't possibly be the one we were looking for.
With this commit, the algorithm becomes more or less like this:
With this, we avoid looking over a very big number of entries whose prefix are different from ours. The effect is that of switching from a linear search on a list to following pointers on a tree structure.
The list of entries with the same prefix is implemented as a JS Map of prefixes to lists of indexes into the stackTable.
With this change, big DHAT profiles take a lot less time to load (e.g. a 20MB profile would take ~2min before, now it takes ~2.5s :D).
This is the profile I tested with: dhat.out.3106 🙂
Sorry if I messed something up, I'm not particularly acquainted with JS (I even had a fight with the type checker, despite we having the same name! :P )