-
Notifications
You must be signed in to change notification settings - Fork 964
Add support for int8 input/output #4729
Copy link
Copy link
Closed
Labels
partner: armFor backend delegation, kernels, demo, etc. from the 3rd-party partner, ArmFor backend delegation, kernels, demo, etc. from the 3rd-party partner, ArmtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Metadata
Metadata
Assignees
Labels
partner: armFor backend delegation, kernels, demo, etc. from the 3rd-party partner, ArmFor backend delegation, kernels, demo, etc. from the 3rd-party partner, ArmtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
🚀 The feature, motivation and pitch
Background:
Fp32 arithmetic typically is avoided in the embedded (microcontroller) domain, due to tight cycle and memory constraints. Hence, sensors usually produce integer data. Therefore, the input/output to an int8-quantized-NN should ideally be of integer dtype (int8) in order to save cycles and memory.
Current behavior:
Input/output is always fp32. Example:
Notes:
• In this example, “accelerated subgraph” is a node (subgraph) delegated to e.g. an NPU such as Ethos-U.
• For the Arm TOSA delegate, we have implemented a workaround (#3056), that tags the d/dq nodes directly connected to the input/output in order for the delegate not to consume those nodes. Hence….
• …the q and dq nodes above are executed on CPU, which cost memory and cycles.
Desired behavior:
Ideally, we’d like a mechanism to change the graph signature such that the int8-quantized-NN takes int8 input:
How, where and when to do that in a way that works well with the rest of the framework is unclear.
Alternatives
No response
Additional context
No response
RFC (Optional)
No response