This issue proposes a new opaque device-specific storage type in WebNN, MLBuffer. MLBuffer is a backend-agnostic storage type (CPU, GPU, NPU, etc) which can be used in WebNN operations.
MLBuffer would be the solution to:
- Give WebNN developer control of device-storage to avoid round-trips to/from CPU.
- Could be extended to export/import to support WebNN interop with web APIs.
Construction/Destruction
typedef [EnforceRange] unsigned long long MLSize64;
dictionary MLBufferDescriptor {
required MLSize64 size;
};
[Exposed=(Window, DedicatedWorker), SecureContext]
interface MLContext {
MLBuffer createBuffer(MLBufferDescriptor descriptor);
};
- Layout of
MLBuffer is always known (and linear access is assumed).
typedef unsigned long long MLSize64Out;
[Exposed=(Window, DedicatedWorker)]
interface MLBuffer {
[CallWith=Isolate] void destroy();
readonly attribute MLSize64Out size;
}
- WebNN developers should prefer calling Destroy(), vs relying on GC, for predictable device memory usage.
- Destroy() gets called on the context timeline but doesn't actually release until the device signals completion.
Upload/Download tensor data
[Exposed=(Window, DedicatedWorker), SecureContext]
interface MLContext {
undefined writeBuffer(
MLBuffer dstBuffer,
MLSize64 dstOffset,
AllowSharedBufferSource srcData,
optional MLSize64 srcOffset = 0,
optional MLSize64 srcSize);
[Exposed=(Window)]
Promise<ArrayBuffer> readBuffer(
MLBuffer srcBuffer,
MLSize64 srcOffset,
MLSize64 srcSize);
[Exposed=(DedicatedWorker)]
void readBufferSync(
MLBuffer srcBuffer,
MLSize64 srcOffset,
MLSize64 srcSize,
AllowSharedBufferSource dstData);
};
- Transfer operations will execute on the device timeline in the same order they were enqueued on the context timeline.
- A copy of
srcData is always made and returns control back to the web developer immediately.
- For synchronous compute, use the read-back functions for window and workers, async and sync, respectively.
Binding to graphs
dictionary MLBufferView {
required MLBuffer buffer;
MLSize64 offset = 0;
MLSize64 size;
};
typedef record<DOMString, MLBufferView> MLNamedMLBufferViews;
undefined dispatch(
MLGraph graph, MLNamedMLBufferViewsinputs, MLNamedMLBufferViews outputs);
- Buffer usage is always assumed on first access (ex. passed as
outputs assumes output usage).
- WebNN developer must call readBuffer() to get a resulting output ML buffer back after compute().
const bufferA = new Float32Array(4).fill(1.0);
const bufferB = new MLBuffer({size:4});
const inputs = {'A': bufferA};
const outputs = {'B': bufferB};
context.dispatch(graph, inputs, outputs);
context.readBuffer(bufferB);
Edits:
- 12/15: added MLBuffer
dispatch instead of overloading compute() per https://www.w3.org/2023/12/14-webmachinelearning-minutes.html
- 12/15: fixed createBuffer return - should of been non-optional.
- 1/10: edit to rename MLNamedMLBufferViews => MLNamedMLBufferResourceViews
- 1/10: added readBufferSync
- 1/17: renamed MLBufferResource => MLBuffer
This issue proposes a new opaque device-specific storage type in WebNN,
MLBuffer.MLBufferis a backend-agnostic storage type (CPU, GPU, NPU, etc) which can be used in WebNN operations.MLBufferwould be the solution to:Construction/Destruction
MLBufferis always known (and linear access is assumed).Upload/Download tensor data
srcDatais always made and returns control back to the web developer immediately.Binding to graphs
dictionary MLBufferView { required MLBuffer buffer; MLSize64 offset = 0; MLSize64 size; }; typedef record<DOMString, MLBufferView> MLNamedMLBufferViews; undefined dispatch( MLGraph graph, MLNamedMLBufferViewsinputs, MLNamedMLBufferViews outputs);outputsassumes output usage).Edits:
dispatchinstead of overloadingcompute()per https://www.w3.org/2023/12/14-webmachinelearning-minutes.html