Qualcomm AI Engine Direct - Delegated mutable buffer#6727
Qualcomm AI Engine Direct - Delegated mutable buffer#6727shewu-quic wants to merge 2 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6727
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New FailureAs of commit 89af1e0 with merge base 86cb5d7 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Hi @cccclai, Thank you very much :) |
cccclai
left a comment
There was a problem hiding this comment.
This looks very solid, thanks!
|
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
Hey it seems like breaking the CI |
It seems to be something wrong to delegate mutable buffer without quantize. |
|
It seems that delegated mutable buffer is not removed from the output. |
summary: - Support copy op with QNN Reshape - Consume mutable buffer in QNN Delegate - Set the same memory address for I/O of mutable buffer at runtime
7f236c3 to
da1df61
Compare
This PR needs a
|
|
@pytorchbot label "topic: not user facing" |
|
Didn't find following labels among repository labels: topic: not user facing |
da1df61 to
89af1e0
Compare
|
It seems like the label thing is new...will check how to resolve it |
|
Hello, is this PR still needed? Assuming yes, but we're focusing on static llama now... |
Yes, I think we can close it. Thanks. |
…le buffer issue (#11782) Summary: - Add a parameter to support mutable buffer delegation in QNN Backend - Set the same memory address for I/O of mutable buffer at runtime - Ref: #6727 - Avoid annotating the input node because mutable buffers will be folded during the convert_pt2e process. - Deprecated use_legacy_export in executorch llama cc @cccclai @winskuo-quic @cbilgin
…le buffer issue (pytorch#11782) Summary: - Add a parameter to support mutable buffer delegation in QNN Backend - Set the same memory address for I/O of mutable buffer at runtime - Ref: pytorch#6727 - Avoid annotating the input node because mutable buffers will be folded during the convert_pt2e process. - Deprecated use_legacy_export in executorch llama cc @cccclai @winskuo-quic @cbilgin
summary:
Test the PR for llama 3.2 1B instruct with seq_len=512 on SM8650


Test the mainline