Add quantized op support to llama runner#3062
Add quantized op support to llama runner#3062larryliu0820 wants to merge 7 commits intogh/larryliu0820/27/basefrom
Conversation
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/3062
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 41abbb5 with merge base 458d743 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
|
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D56197863](https://our.internmc.facebook.com/intern/diff/D56197863) [ghstack-poisoned]
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D56197863](https://our.internmc.facebook.com/intern/diff/D56197863) [ghstack-poisoned]
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D56197863](https://our.internmc.facebook.com/intern/diff/D56197863) [ghstack-poisoned]
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D56197863](https://our.internmc.facebook.com/intern/diff/D56197863) [ghstack-poisoned]
| -DEXECUTORCH_BUILD_OPTIMIZED=ON \ | ||
| -DEXECUTORCH_BUILD_XNNPACK="$XNNPACK" \ | ||
| -DEXECUTORCH_BUILD_OPTIMIZED=ON \ | ||
| -DEXECUTORCH_BUILD_QUANTIZED="$QE" \ |
There was a problem hiding this comment.
Can we not always build? haveing so many options for build feels like additional burden for users. Maybe do default opt-in?
| EXPORT_ARGS="${EXPORT_ARGS} -kv --use_sdpa_with_kv_cache -X -qmode 8da4w -G 128" | ||
| EXPORT_ARGS="-c stories110M.pt -p ${PARAMS} -d ${DTYPE} -n ${EXPORTED_MODEL_NAME} -kv" | ||
| if [[ "${XNNPACK}" == "ON" ]]; then | ||
| EXPORT_ARGS="${EXPORT_ARGS} -X -qmode 8da4w -G 128" |
There was a problem hiding this comment.
nit: does += operator work?
| if(EXECUTORCH_USE_TIKTOKEN) | ||
| # find RE2 for tokenizer | ||
| set(ABSL_ENABLE_INSTALL ON) | ||
| set(ABSL_PROPAGATE_CXX_STD ON) |
There was a problem hiding this comment.
oh we depend on abseil for tiktoken?
There was a problem hiding this comment.
Yeah, tiktoken -> re2 -> abseil
| if(EXECUTORCH_USE_TIKTOKEN) | ||
| # find RE2 for tokenizer | ||
| set(ABSL_ENABLE_INSTALL ON) | ||
| set(ABSL_PROPAGATE_CXX_STD ON) |
There was a problem hiding this comment.
no tests using this path yet right?
| EXPORT_ARGS="${EXPORT_ARGS} --use_sdpa_with_kv_cache" | ||
| fi | ||
| if [[ "${QE}" == "ON" ]]; then | ||
| EXPORT_ARGS="${EXPORT_ARGS} --embedding-quantize 8,1024" |
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D56197863](https://our.internmc.facebook.com/intern/diff/D56197863) [ghstack-poisoned]
|
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
This pull request has been merged in 1f4b631. |
Summary: As titled. Got too excited in #3062 and removed `EXECUTORCH_BUILD_QUANTIZED`. Looking at the CI job failure of `build-apple-framework` probably worth adding it back. Test Plan: See that CI job pass Reviewers: Subscribers: Tasks: Tags:
Summary: As titled. Got too excited in #3062 and removed `EXECUTORCH_BUILD_QUANTIZED`. Looking at the CI job failure of `build-apple-framework` probably worth adding it back. Test Plan: See that CI job pass Reviewed By: shoumikhin Differential Revision: D56281923 Pulled By: larryliu0820
Summary: As titled. Got too excited in #3062 and removed `EXECUTORCH_BUILD_QUANTIZED`. Looking at the CI job failure of `build-apple-framework` probably worth adding it back. Pull Request resolved: #3115 Test Plan: See that CI job pass Reviewed By: shoumikhin Differential Revision: D56281923 Pulled By: larryliu0820 fbshipit-source-id: e6ad411f763ff8e11d4fb1e0bc7037eb2cf69357
Stack from ghstack (oldest at bottom):
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D56197863