[Torch] Experimental support for FX-quantized models#10091
[Torch] Experimental support for FX-quantized models#10091jroesch merged 15 commits intoapache:mainfrom
Conversation
| amin = get_v(inputs[1], np.finfo(np.float32).min) | ||
| amax = get_v(inputs[2], np.finfo(np.float32).max) | ||
| if min_only: | ||
| amin = get_v(inputs[1], np.finfo(np.float32).min) |
There was a problem hiding this comment.
It is not introduced in this PR, but is using float32's max prudent here? We might have all sorts of dtypes as inputs, also -inf should probably not be clamped to -3.4028235e+38.
There was a problem hiding this comment.
Good catch, fixed but not sure what to do with clamping inf.
There was a problem hiding this comment.
I think -inf should stay -inf if we clamp with only max. PyTorch uses separate implementation kernels that only do the required ops for this (kind of the reverse of what we do here), I don't know if that might be a good choice for TVM.
There was a problem hiding this comment.
I see, that makes sense. Probably the best way for us is to allow None value in clip op for one-way clipping, and do only max or min inside topi. I left a TODO item for that.
Probably other op conversions also have issues with inf handling...
|
Can somebody merge this PR? I'm not aware of anyone who is familiar with PT quantization, but I believe the change should be no brainer. This is a nice feature to land, which was also requested in https://discuss.tvm.apache.org/t/pytorch-qnn-cannot-import-torchscript-produced-by-fx-graph-mode-quantization/11954 |
* works on resnet18 and deeplabv3 * yolo5 conversion worked * fixed sigmoid * [Torch] Support clamp_min, clamp_max * fixed clamp_min * fixed quantize for 1 dim input * cleanup * improve inline_qparam impl * add clamp_min/max test * add fx quant test * cleanup * skip build in testing * black * improve clamp conversion * leave TODO on inf handling
This is the first step toward supporting models quantized with the FX-based workflow as described in https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_static.html.
The required change was surprisingly simple: Simple graph surgery done by
inline_input_quant_params_for_fx(...)inqnn_torch.pyis enough. So far, I was able to quantize imagenet models, deeplab v3, ssd-vgg, and yolov5, either fully or semi automatically. See the attached test cases.Since my current interest is to collect real-world quantized workloads for performance benchmarking, I didn't care about calibration.
Also added
aten::clamp_minsupport for SSD-VGG.@comaniac @lhutton1 @junrushao1994 @siju-samuel @t-vi @AndrewZhaoLuo