[Dlight] Enhance Decode-GEMV Schedule#15195
Merged
MasterJH5574 merged 1 commit intoapache:unityfrom Jul 5, 2023
Merged
Conversation
Collaborator
|
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
0596441 to
5d715aa
Compare
dc62d15 to
0d8ff66
Compare
This PR enhances Decode-GEMV rule with the following changes: - Normalize the GEMV iter domain to S-R-C via transform-block-layout. This would help with further analysis and scheduling, in cases for example, when there was no spatial loop in the original reduction block. - Get rid of the ad hoc iter type analysis, including the logic calling into a TVM packed func `tir.schedule.GetLoopIterType` using `tvm._ffi.get_global_func`. - Split out the logic for two separate cases of scheduling, where the innermost dimension is spatial or reduction. - Introduces `suggest_threads_per_block` to guess the threads to be allocated each threadblock. This helps avoid the previous case where dlight allocates 256 threads for a workload whose degree of parallelism is only 128. - Misc improvements. This rest of the changes are split out to separate PRs that are already merged to main. - [x] Pass the hints to arithmetic analyzer that shape variables should be positive ones (apache#15210) - [x] Eliminate unnecessary block predicate generation - should be provable via affine analysis (apache#15193) - [x] Shrink local memory allocation if only one element `X[threadIdx.x]` is used (apache#15207)
0d8ff66 to
b25bd0b
Compare
MasterJH5574
approved these changes
Jul 5, 2023
tqchen
reviewed
Jul 5, 2023
| dynamic: List[int] = [] | ||
| for i, loop in enumerate(loops): | ||
| loop_extent = loop.extent | ||
| if isinstance(loop_extent, tir.IntImm): |
Member
There was a problem hiding this comment.
We should be able to factor out the loop extent into constant and dynamic component, this will handle extents like 32 * n
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR enhances Decode-GEMV rule with the following changes:
This would help with further analysis and scheduling, in cases for
example, when there was no spatial loop in the original reduction
block.
into a TVM packed func
tir.schedule.GetLoopIterTypeusingtvm._ffi.get_global_func.innermost dimension is spatial or reduction.
suggest_threads_per_blockto guess the threads to beallocated each threadblock. This helps avoid the previous case where
dlight allocates 256 threads for a workload whose degree of parallelism
is only 128.
This rest of the changes are split out to separate PRs that are already
merged to main.
be positive ones ([TIR][Schedule] Derive Nonnegative Bounds from Shape Var #15210)
provable via affine analysis ([ARITH] Allow Analyzer to MarkGlobalNonNegValue #15193)
X[threadIdx.x]is used ([TIR][Transform] Add LiftThreadBinding Pass #15207)