CANN: added q4_1 and q8_1 quantization support for CANN backend #21
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[CANN] 添加对于q4_1,q8_1量化类型的支持
概述 (Summary)
添加对于q4_1,q8_1量化类型的支持
问题描述
当前项目并未支持q4_1,q8_1量化类型到CANN后端的前向逻辑
背景 (Motivations)
目标 (Goals)
详细设计 (Detailed Design)
通过MUL_MAT testcases主要需要添加block q4_1, block q8_1数据类型,及其到cann接口支持类型的转换和反转换逻辑,并在调用时添加q4_1, q8_1的入口。
q4_1,q8_1 block格式定义内容:
ggml_cann_compute_forward->ggml_cann_mul_mat->ggml_cann_mul_mat_quant
修改sgml_cann_mul_mat_quant添加q4_1, q8_1入口,并添加对于两个新类型中多出的一个半精度位的读取与传入逻辑。
随后为了适应CANN后端提供的WeightQuantBatchMatmulV2接口,完成对应的新类型的数据转换逻辑,主要分为格式和数值两个方面的操作,最终统一转化为acl支持类型的tensor传入。
针对格式部分,将blockwise存储的量化格式权重按元素类别分别重排并创建tensor
针对数值部分,对q4_1进行反量化时添加对于反转换的逻辑支持,主要包括向qs向有符号数转换,然后对offset添加8*d的补偿。对q8_1只修改block内部的读取逻辑,数值转换上不做处理。
细节见ggml_cann.cpp文件中的transform/transform_back相关函数实现
测试结果:
q4_1,q8_1作为type a在原始项目支持的no permutation cases中均通过
备注:
test_backend_ops文件在指定 -b CANN时仍然测试备用后端cpu,此处测试时添加了cpu上的q8_1_vec_dot前向逻辑的placeholder用于测试,但是提交pr时并没有保留对于test_backend文件这部分的修改。如需测试q8_1可暂时禁用添加cpu backend。