Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Update quant overview for 021 (#3845)
Summary: Pull Request resolved: #3845

Reviewed By: Gasoonjia

Differential Revision: D58176137

Pulled By: Jack-Khuu

fbshipit-source-id: bdaf01a8fb66ba3333c3b6d7802c3bb02b20c4a5
(cherry picked from commit f48f392)
  • Loading branch information
Jack-Khuu authored and pytorchbot committed Jun 5, 2024
commit 217de7c83fff1496f5ae5db8edeef1de07c1c75b
22 changes: 22 additions & 0 deletions docs/source/quantization-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,25 @@ Backend developers will need to implement their own ``Quantizer`` to express how
Modeling users will use the ``Quantizer`` specific to their target backend to quantize their model, e.g. ``XNNPACKQuantizer``.

For an example quantization flow with ``XNPACKQuantizer``, more documentation and tutorials, please see ``Performing Quantization`` section in [ExecuTorch tutorial](./tutorials/export-to-executorch-tutorial).

## Source Quantization: Int8DynActInt4WeightQuantizer

In addition to export based quantization (described above), ExecuTorch wants to highlight source based quantizations, accomplished via [torchao](https://github.com/pytorch/ao). Unlike export based quantization, source based quantization directly modifies the model prior to export. One specific example is `Int8DynActInt4WeightQuantizer`.

This scheme represents 4-bit weight quantization with 8-bit dynamic quantization of activation during inference.

Imported with ``from torchao.quantization.quant_api import Int8DynActInt4WeightQuantizer``, this class uses a quantization instance constructed with a specified dtype precision and groupsize, to mutate a provided ``nn.Module``.

```
# Source Quant
from torchao.quantization.quant_api import Int8DynActInt4WeightQuantizer

model = Int8DynActInt4WeightQuantizer(precision=torch_dtype, groupsize=group_size).quantize(model)

# Export to ExecuTorch
from executorch.exir import to_edge
from torch.export import export

exported_model = export(model, ...)
et_program = to_edge(exported_model, ...).to_executorch(...)
```