Support for device-based tensor storage objects

This issue proposes a new opaque device-specific storage type in WebNN, `MLBuffer`.  `MLBuffer` is a backend-agnostic storage type (CPU, GPU, NPU, etc) which can be used in WebNN operations.

`MLBuffer` would be the solution to:
1. Give WebNN developer control of device-storage to avoid round-trips to/from CPU.
2. Could be extended to export/import to support WebNN interop with web APIs.

## Construction/Destruction

```webidl
typedef [EnforceRange] unsigned long long MLSize64;

dictionary MLBufferDescriptor {
    required MLSize64 size;
};

[Exposed=(Window, DedicatedWorker), SecureContext]
interface MLContext {
    MLBuffer createBuffer(MLBufferDescriptor descriptor);
};
```

* Layout of `MLBuffer` is always known (and linear access is assumed).

```webidl
typedef unsigned long long MLSize64Out;

[Exposed=(Window, DedicatedWorker)]
interface MLBuffer {
  [CallWith=Isolate] void destroy();

  readonly attribute MLSize64Out size;
}
```

* WebNN developers should prefer calling Destroy(), vs relying on GC, for predictable device memory usage.
* Destroy() gets called on the context timeline but doesn't actually release until the device signals completion.

## Upload/Download tensor data

```webidl
[Exposed=(Window, DedicatedWorker), SecureContext]
interface MLContext {

   undefined writeBuffer(
        MLBuffer dstBuffer, 
        MLSize64 dstOffset, 
        AllowSharedBufferSource srcData,
        optional MLSize64 srcOffset = 0,
        optional MLSize64 srcSize);

  [Exposed=(Window)]
  Promise<ArrayBuffer> readBuffer(
        MLBuffer srcBuffer,
        MLSize64 srcOffset,
        MLSize64 srcSize);
  
  [Exposed=(DedicatedWorker)]
  void readBufferSync(
        MLBuffer srcBuffer, 
        MLSize64 srcOffset,
        MLSize64 srcSize,
       AllowSharedBufferSource dstData);
};
```

* Transfer operations will execute on the device timeline in the same order they were enqueued on the context timeline.
* A copy of `srcData` is always made and returns control back to the web developer immediately.
* For synchronous compute, use the read-back functions for window and workers, async and sync, respectively.

## Binding to graphs

```webidl
dictionary MLBufferView {
  required MLBuffer buffer;
  MLSize64 offset = 0;
  MLSize64 size;
};

typedef record<DOMString, MLBufferView> MLNamedMLBufferViews;

undefined dispatch(
      MLGraph graph, MLNamedMLBufferViewsinputs, MLNamedMLBufferViews outputs);
```

* Buffer usage is always assumed on first access (ex. passed as `outputs` assumes output usage).
* WebNN developer must call readBuffer() to get a resulting output ML buffer back after compute().

```js
const bufferA = new Float32Array(4).fill(1.0);
const bufferB = new MLBuffer({size:4});
const inputs = {'A': bufferA};
const outputs = {'B': bufferB};
context.dispatch(graph, inputs, outputs);
context.readBuffer(bufferB);
```

Edits: 
* 12/15: added MLBuffer `dispatch` instead of overloading `compute()` per https://www.w3.org/2023/12/14-webmachinelearning-minutes.html
* 12/15: fixed createBuffer return - should of been non-optional.
* 1/10: edit to rename MLNamedMLBufferViews => MLNamedMLBufferResourceViews
* 1/10: added readBufferSync
* 1/17: renamed ML*BufferResource => ML*Buffer













Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for device-based tensor storage objects #482

Construction/Destruction

Upload/Download tensor data

Binding to graphs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for device-based tensor storage objects #482

Description

Construction/Destruction

Upload/Download tensor data

Binding to graphs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions