A unified ML software stack within the PyTorch platform for edge devices. It defines new compiler entry points as well as a state-of-art runtime.
Compared to the legacy Lite Interpreter, there are some major benefits:
- Performance wins compared to Lite Interpreter
- Faster (orders of magnitude lower framework tax in both DSP and CPU)
- Much smaller binary size, 1.5 MB vs 30 KB without operators.
- Smaller memory footprint because we do ahead of time memory planning in ExecuTorch and also have clear granular control over where the runtime allocations are done.
- Long term alignment with the direction of PyTorch infrastructure
- Lite Interpreter relies on TorchScript, which is being phased out; ExecuTorch is the planned replacement for Lite Interpreter.
- Model Authoring & Productivity gains
- More and better defined entry points to perform model, device, and/or use-case specific optimizations (e.g. better backend delegation, user-defined compiler transformations, default or user-defined memory planning, etc)
- Ability to lower constructs like dynamic control flow to run on device.
- Minimal binary size (< 50KB not including kernels)
- Minimal framework tax: loading program, initializing executor, kernel and backend-delegate dispatch, runtime memory utilization
- Portable (cross-compile across many toolchains)
- Executes ATen kernels (or ATen custom kernels)
- Executes custom op kernels
- Supports inter op asynchronous execution
- Supports static memory allocation (heapless)
- Supports custom allocation across memory hierarchies
- Supports control flow needed by models
- Allows selective build of kernels
- Allows backend delegation with lightweight interface