From 217de7c83fff1496f5ae5db8edeef1de07c1c75b Mon Sep 17 00:00:00 2001 From: Jack-Khuu Date: Wed, 5 Jun 2024 14:04:08 -0700 Subject: [PATCH] Update quant overview for 021 (#3845) Summary: Pull Request resolved: https://github.com/pytorch/executorch/pull/3845 Reviewed By: Gasoonjia Differential Revision: D58176137 Pulled By: Jack-Khuu fbshipit-source-id: bdaf01a8fb66ba3333c3b6d7802c3bb02b20c4a5 (cherry picked from commit f48f392ee874f8d0cd0251ded63b38d8365677d0) --- docs/source/quantization-overview.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/docs/source/quantization-overview.md b/docs/source/quantization-overview.md index 3a56fb4577f..e80cfd2eb83 100644 --- a/docs/source/quantization-overview.md +++ b/docs/source/quantization-overview.md @@ -14,3 +14,25 @@ Backend developers will need to implement their own ``Quantizer`` to express how Modeling users will use the ``Quantizer`` specific to their target backend to quantize their model, e.g. ``XNNPACKQuantizer``. For an example quantization flow with ``XNPACKQuantizer``, more documentation and tutorials, please see ``Performing Quantization`` section in [ExecuTorch tutorial](./tutorials/export-to-executorch-tutorial). + +## Source Quantization: Int8DynActInt4WeightQuantizer + +In addition to export based quantization (described above), ExecuTorch wants to highlight source based quantizations, accomplished via [torchao](https://github.com/pytorch/ao). Unlike export based quantization, source based quantization directly modifies the model prior to export. One specific example is `Int8DynActInt4WeightQuantizer`. + +This scheme represents 4-bit weight quantization with 8-bit dynamic quantization of activation during inference. + +Imported with ``from torchao.quantization.quant_api import Int8DynActInt4WeightQuantizer``, this class uses a quantization instance constructed with a specified dtype precision and groupsize, to mutate a provided ``nn.Module``. + +``` +# Source Quant +from torchao.quantization.quant_api import Int8DynActInt4WeightQuantizer + +model = Int8DynActInt4WeightQuantizer(precision=torch_dtype, groupsize=group_size).quantize(model) + +# Export to ExecuTorch +from executorch.exir import to_edge +from torch.export import export + +exported_model = export(model, ...) +et_program = to_edge(exported_model, ...).to_executorch(...) +```