-
Notifications
You must be signed in to change notification settings - Fork 262
[core] Support negative startIndex for streaming #1460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
5d243a3
2241748
40ba940
749e64b
45d474d
4a22114
34ef29e
a17b11b
8b994fc
23ce02d
89939b3
7a2ee25
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| --- | ||
| "@workflow/world": patch | ||
| "@workflow/world-local": patch | ||
| "@workflow/world-postgres": patch | ||
| "@workflow/core": patch | ||
| --- | ||
|
|
||
| Support negative `startIndex` for streaming (e.g. `-3` reads last 3 chunks) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -614,49 +614,96 @@ describe('e2e', () => { | |
| // Output stream tests use run.getReadable() which requires in-process streaming | ||
| // infrastructure. The local world's streamer uses an EventEmitter that doesn't work | ||
| // cross-process (test runner ↔ workbench app). | ||
| test.skipIf(isLocalDeployment())( | ||
| 'outputStreamWorkflow', | ||
| { timeout: 60_000 }, | ||
| async () => { | ||
| const run = await start(await e2e('outputStreamWorkflow'), []); | ||
| const reader = run.getReadable().getReader(); | ||
| const namedReader = run.getReadable({ namespace: 'test' }).getReader(); | ||
|
|
||
| // First chunk from default stream: binary data | ||
| const r1 = await reader.read(); | ||
| assert(r1.value); | ||
| assert(r1.value instanceof Uint8Array); | ||
| expect(Buffer.from(r1.value).toString()).toEqual('Hello, world!'); | ||
|
|
||
| // First chunk from named stream: binary data | ||
| const r1Named = await namedReader.read(); | ||
| assert(r1Named.value); | ||
| assert(r1Named.value instanceof Uint8Array); | ||
| expect(Buffer.from(r1Named.value).toString()).toEqual( | ||
| 'Hello, named stream!' | ||
| ); | ||
|
|
||
| // Second chunk from default stream: JSON object | ||
| const r2 = await reader.read(); | ||
| assert(r2.value); | ||
| expect(r2.value).toEqual({ foo: 'test' }); | ||
| // | ||
| // outputStreamWorkflow writes 2 chunks to the default stream: | ||
| // chunk 0: binary "Hello, world!" | ||
| // chunk 1: object { foo: 'test' } | ||
| // and 2 chunks to the "test" named stream: | ||
| // chunk 0: binary "Hello, named stream!" | ||
| // chunk 1: object { foo: 'bar' } | ||
| describe.skipIf(isLocalDeployment())('outputStreamWorkflow', () => { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why are we skipping this test in local deployment? 🤔 this might be outdated and we should probably re-enable streaming tests in local deployments
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's already being done so this PR isn't changing it. I might try re-enabling in the next PR
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah just fly by noticed this |
||
| const startIndexCases = [ | ||
| { | ||
| name: 'no startIndex (reads all chunks)', | ||
| startIndex: undefined, | ||
| expectedDefault: [ | ||
| { type: 'binary', value: 'Hello, world!' }, | ||
| { type: 'object', value: { foo: 'test' } }, | ||
| ], | ||
| expectedNamed: [ | ||
| { type: 'binary', value: 'Hello, named stream!' }, | ||
| { type: 'object', value: { foo: 'bar' } }, | ||
| ], | ||
| // Can stream in real-time without waiting for completion | ||
| waitForCompletion: false, | ||
| }, | ||
| { | ||
| name: 'positive startIndex (skips first chunk)', | ||
| startIndex: 1, | ||
| expectedDefault: [{ type: 'object', value: { foo: 'test' } }], | ||
| expectedNamed: [{ type: 'object', value: { foo: 'bar' } }], | ||
| // Positive startIndex needs the stream written up to that point | ||
| waitForCompletion: true, | ||
| }, | ||
| { | ||
| name: 'negative startIndex (reads from end)', | ||
| startIndex: -1, | ||
| expectedDefault: [{ type: 'object', value: { foo: 'test' } }], | ||
| expectedNamed: [{ type: 'object', value: { foo: 'bar' } }], | ||
| // Negative startIndex resolves at connection time using knownChunkCount, | ||
| // so the stream must be fully written before connecting the reader. | ||
| waitForCompletion: true, | ||
| }, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The e2e coverage for negative startIndex only tests
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will add a broader test in the next PR |
||
| ] as const; | ||
|
|
||
| for (const tc of startIndexCases) { | ||
| test(tc.name, { timeout: 60_000 }, async () => { | ||
| const run = await start(await e2e('outputStreamWorkflow'), []); | ||
|
|
||
| if (tc.waitForCompletion) { | ||
| await run.returnValue; | ||
| } | ||
|
|
||
| // Second chunk from named stream: JSON object | ||
| const r2Named = await namedReader.read(); | ||
| assert(r2Named.value); | ||
| expect(r2Named.value).toEqual({ foo: 'bar' }); | ||
| const reader = run | ||
| .getReadable({ startIndex: tc.startIndex }) | ||
| .getReader(); | ||
| const namedReader = run | ||
| .getReadable({ namespace: 'test', startIndex: tc.startIndex }) | ||
| .getReader(); | ||
|
|
||
| for (const expected of tc.expectedDefault) { | ||
| const { value } = await reader.read(); | ||
| assert(value); | ||
| if (expected.type === 'binary') { | ||
| assert(value instanceof Uint8Array); | ||
| expect(Buffer.from(value).toString()).toEqual(expected.value); | ||
| } else { | ||
| expect(value).toEqual(expected.value); | ||
| } | ||
| } | ||
|
|
||
| // Streams should be closed | ||
| const r3 = await reader.read(); | ||
| expect(r3.done).toBe(true); | ||
| // Default stream should be closed after expected chunks | ||
| expect((await reader.read()).done).toBe(true); | ||
|
|
||
| for (const expected of tc.expectedNamed) { | ||
| const { value } = await namedReader.read(); | ||
| assert(value); | ||
| if (expected.type === 'binary') { | ||
| assert(value instanceof Uint8Array); | ||
| expect(Buffer.from(value).toString()).toEqual(expected.value); | ||
| } else { | ||
| expect(value).toEqual(expected.value); | ||
| } | ||
| } | ||
|
|
||
| const r3Named = await namedReader.read(); | ||
| expect(r3Named.done).toBe(true); | ||
| // Named stream should be closed after expected chunks | ||
| expect((await namedReader.read()).done).toBe(true); | ||
|
|
||
| const returnValue = await run.returnValue; | ||
| expect(returnValue).toEqual('done'); | ||
| const returnValue = await run.returnValue; | ||
| expect(returnValue).toEqual('done'); | ||
| }); | ||
| } | ||
| ); | ||
| }); | ||
|
|
||
| test.skipIf(isLocalDeployment())( | ||
| 'outputStreamInsideStepWorkflow - getWritable() called inside step functions', | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -341,9 +341,33 @@ export function createStreamer(basedir: string, tag?: string): Streamer { | |
| .filter((file) => file.startsWith(`${name}-`)) | ||
| .sort(); // ULID lexicographic sort = chronological order | ||
|
|
||
| // Resolve negative startIndex relative to the number of data chunks | ||
| // (excluding the trailing EOF marker chunk, if present). | ||
| let dataChunkCount = chunkFiles.length; | ||
| if ( | ||
| typeof startIndex === 'number' && | ||
| startIndex < 0 && | ||
| chunkFiles.length > 0 | ||
| ) { | ||
| const lastFile = chunkFiles[chunkFiles.length - 1]; | ||
| const lastExt = fileExtMap.get(lastFile) ?? '.bin'; | ||
| // Note: this incurs an extra disk read to check the EOF marker. | ||
| // Acceptable since negative startIndex is not a hot path. | ||
| const lastChunk = deserializeChunk( | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This reads and deserializes the last chunk file just to check |
||
| await readBuffer(path.join(chunksDir, `${lastFile}${lastExt}`)) | ||
| ); | ||
| if (lastChunk?.eof === true) { | ||
| dataChunkCount--; | ||
| } | ||
| } | ||
| const resolvedStartIndex = | ||
| typeof startIndex === 'number' && startIndex < 0 | ||
| ? Math.max(0, dataChunkCount + startIndex) | ||
| : startIndex; | ||
|
|
||
| // Process existing chunks, skipping any already delivered via events | ||
| let isComplete = false; | ||
| for (let i = startIndex; i < chunkFiles.length; i++) { | ||
| for (let i = resolvedStartIndex; i < chunkFiles.length; i++) { | ||
| const file = chunkFiles[i]; | ||
| // Extract chunk ID from filename: "streamName-chunkId" or "streamName-chunkId.tag" | ||
| const rawChunkId = file.substring(name.length + 1); | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -35,6 +35,7 @@ function httpLog( | |
| ); | ||
| } | ||
| } | ||
|
|
||
| import { | ||
| ErrorType, | ||
| getSpanKind, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add some note/callout to be careful about paginating on this. Because streams are live and will continue enqueueing chunks, relative numbers will map to different absolute chunk "IDs" on subsequent calls. I think accurate pagination is only possible when cursor based, and streams don't support cursor based pagination yet so clients should account for that limitation
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a problem when using
run.getReadabledirectly, but it is a big problem for resumption, i.e. WorkflowChatTransport. I'm working on this in a follow-up PR together with a metadata/tail endpoint, and cursor-based pagination. I'll add a comment in this PR to mention the limitation