Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Cast optimization improvement: also handle local.tee
Converts the following:

 (some.operation
   (ref.cast .. (local.tee $ref ..))
   (local.get $ref)
 )

into:

 (some.operation
   (local.tee $temp
     (ref.cast .. (local.tee $ref ..))
   )
   (local.get $temp)
 )

This removes close to 10% of the casts in a test program (from 38113
casts down to 36119).

I also tried to improve the optimization that moves more refined casts
earlier, but this does not seem very effective (it only eliminated 9
further casts), so I'm not sure it is worth it.
  • Loading branch information
vouillon committed Apr 17, 2024
commit 6cc2f1bf7a4746c0339483ebf59fe01f916ac17e
50 changes: 23 additions & 27 deletions src/passes/OptimizeCasts.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -92,22 +92,6 @@
// Note that right now, we only consider RefAs with op RefAsNonNull as a cast.
// RefAs with ExternInternalize and ExternExternalize are not considered casts
// when obtaining fallthroughs, and so are ignored.
//
// TODO: 1. Look past individual basic blocks? This may be worth considering
// given the pattern of a cast appearing in an if condition that is
// then used in an if arm, for example, where simple dominance shows
// the cast can be reused.
// TODO: 2. Look at LocalSet as well and not just Get. That would add some
// overlap with the other passes mentioned above (SimplifyLocals and
// RedundantSetElimination also track sets and can switch a get to use
// a better set's index when that refines the type). But once we do the
// first two TODOs above then we'd be adding some novel things here,
// as we could optimize "backwards" as well (TODO 1) and past basic
// blocks (TODO 2, though RedundantSetElimination does that as well).
// However, we should consider whether improving those other passes
// might make more sense (as it would help more than casts, if we could
// make them operate "backwards" and/or past basic blocks).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first TODO is still relevant, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it is indeed only partially done (with connectAdjacentBlocks set to true).

//

#include "ir/effects.h"
#include "ir/linear-execution.h"
Expand Down Expand Up @@ -453,20 +437,32 @@ struct BestCastFinder : public LinearExecutionWalker<BestCastFinder> {
void visitRefCast(RefCast* curr) { handleRefinement(curr); }

void handleRefinement(Expression* curr) {
auto* teeFallthrough = Properties::getFallthrough(
curr, options, *getModule(), Properties::FallthroughBehavior::NoTeeBrIf);
if (auto* set = teeFallthrough->dynCast<LocalSet>()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (auto* set = teeFallthrough->dynCast<LocalSet>()) {
if (auto* tee = teeFallthrough->dynCast<LocalSet>()) {

Also the isTee check below is not needed, as a fallthrough LocalSet must be a tee (because only a tee flows out a value that can fall through to some place).

if (set->isTee()) {
updateBestCast(curr, set->index);
}
}
auto* fallthrough = Properties::getFallthrough(curr, options, *getModule());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto* fallthrough = Properties::getFallthrough(curr, options, *getModule());
auto* fallthrough = Properties::getFallthrough(teeFallthrough, options, *getModule());

This saves a little work: we looked a little the first time, and the second time we don't need to re-do that work, and just continue from there.

if (auto* get = fallthrough->dynCast<LocalGet>()) {
auto*& bestCast = mostCastedGets[get->index];
if (!bestCast) {
// This is the first.
bestCast = curr;
return;
}
updateBestCast(curr, get->index);
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return;

}
}

// See if we are better than the current best.
if (curr->type != bestCast->type &&
Type::isSubType(curr->type, bestCast->type)) {
bestCast = curr;
}
void updateBestCast(Expression* curr, Index index) {
auto*& bestCast = mostCastedGets[index];
if (!bestCast) {
// This is the first.
bestCast = curr;
return;
}

// See if we are better than the current best.
if (curr->type != bestCast->type &&
Type::isSubType(curr->type, bestCast->type)) {
bestCast = curr;
}
}
};
Expand Down
37 changes: 37 additions & 0 deletions test/lit/passes/optimize-casts.wast
Original file line number Diff line number Diff line change
Expand Up @@ -1352,6 +1352,43 @@
)
)

;; CHECK: (func $local-tee (type $2) (param $x (ref struct))
;; CHECK-NEXT: (local $y (ref struct))
;; CHECK-NEXT: (local $2 (ref $A))
;; CHECK-NEXT: (drop
;; CHECK-NEXT: (local.tee $2
;; CHECK-NEXT: (ref.cast (ref $A)
;; CHECK-NEXT: (local.tee $y
;; CHECK-NEXT: (local.get $x)
;; CHECK-NEXT: )
;; CHECK-NEXT: )
;; CHECK-NEXT: )
;; CHECK-NEXT: )
;; CHECK-NEXT: (drop
;; CHECK-NEXT: (local.get $2)
;; CHECK-NEXT: )
;; CHECK-NEXT: (drop
;; CHECK-NEXT: (local.get $2)
;; CHECK-NEXT: )
;; CHECK-NEXT: )
(func $local-tee (param $x (ref struct))
(local $y (ref struct))
;; We should use the cast value after it has been computed, in both gets.
(drop
(ref.cast (ref $A)
(local.tee $y
(local.get $x)
)
)
)
(drop
(local.get $x)
)
(drop
(local.get $y)
)
)

;; CHECK: (func $get (type $11) (result (ref struct))
;; CHECK-NEXT: (unreachable)
;; CHECK-NEXT: )
Expand Down