fix(producer): force single worker for html-in-canvas compositions by miguel-heygen · Pull Request #1052 · heygen-com/hyperframes

miguel-heygen · 2026-05-23T19:46:39Z

Summary

Detects html-in-canvas API usage (<canvas layoutsubtree>) during composition compilation and adds it as a new RenderModeHintCode
Forces worker count to 1 when detected — Chrome's drawElementImage does not support concurrent usage across multiple browser instances and produces flickering artifacts
Overrides both auto-sizing and explicit --workers flags (with a warning log when the user explicitly requested >1)

Test plan

New unit test: detectRenderModeHints detects layoutsubtree canvas attribute
New unit test: detectRenderModeHints ignores plain canvas without layoutsubtree
New unit test: resolveRenderWorkerCount forces 1 worker on htmlInCanvas hint
New unit test: resolveRenderWorkerCount overrides explicit --workers 8 with warning
All 99 existing tests pass
Manual: render a liquid-glass composition and verify no flickering

Chrome's drawElementImage API does not support concurrent usage across multiple browser instances — running >1 capture worker causes flickering artifacts. Detect the layoutsubtree canvas attribute during compilation and unconditionally pin workers to 1, overriding both auto-sizing and explicit --workers flags.

jrusso1020

Verdict: COMMENT (would-be APPROVE)

Workaround is well-shaped: surgical detection, tight test coverage, override path keeps the user informed. Two non-blocking observations on workaround-quality vs root-cause framing. Holding for James's stamp greenlight + Magi's deeper root-cause investigation (which will land separately per the delegation).

Surgical scope is right ✓

PR title reads "force single worker for html-in-canvas comps" which could be interpreted as "force-single across the board." The actual implementation is more targeted:

detectRenderModeHints greps the compiled HTML for canvas[layoutsubtree] — the marker for the Chrome experimental html-in-canvas API.
Only when that selector matches does resolveRenderWorkerCount clamp to 1.
Plain canvas comps (three.js, audio waveform, ProgressIndicator, etc.) keep their multi-worker parallelism.

Composition-level surgical, not global. ✓

Sub-composition scope confirmed ✓

I was concerned the per-HTML-string detection might miss canvas[layoutsubtree] usage in sub-compositions — but verified at htmlCompiler.ts:934-944:

const inlinedHtml = inlineSubCompositions(fullHtml, subCompositions, projectDir);
// ...
const sanitizedHtml = inlinedHtml.replace(...);
const renderModeHints = detectRenderModeHints(sanitizedHtml);

detectRenderModeHints runs on the flattened HTML (with all sub-comps inlined), so html-in-canvas usage anywhere in the composition tree triggers the worker clamp. No sub-comp scope gap. ✓

Test coverage solid ✓

Four new tests pin both halves of the behavior:

Detection: canvas[layoutsubtree] triggers / plain canvas doesn't
Worker clamp: auto-sized → 1 / explicit --workers 8 → 1 with warning

The "with warning" assertion (expect(log.warn).toHaveBeenCalledOnce()) specifically pins the user-facing log line, so a future "let's parallelize again" refactor would fail loudly. Good shape — matches the spirit of the briefing's "test that pins the prior behavior so future drift fails loudly."

Observation 1 (worth addressing) — no TODO / tracking-issue link

The two user-facing messages explain what is happening:

"Detected html-in-canvas API (layoutsubtree canvas). Chrome does not support concurrent drawElementImage across multiple workers; render is pinned to a single worker."

But neither references:

A Chromium bug ID (Crbug? Issue tracker?)
An internal tracking issue for Magi's deeper investigation
A // TODO(@miguel-heygen): comment marking this as a workaround to remove

The risk: when the root cause is eventually fixed upstream (or our deeper investigation finds the missing-determinism angle Miguel called out for Magi), nobody knows to remove this clamp. It becomes permanent cruft, capping html-in-canvas renders at 1 worker forever.

Suggested shape (one-line addition to the warning + a code comment):

// TODO(htmlInCanvas): workaround — Chrome's experimental drawElementImage
// API is non-deterministic across concurrent browser instances.
// Remove this clamp once root cause is identified.
// Tracking: <link to Magi's investigation findings or chromium bug>
if (reasonCodes.has("htmlInCanvas")) {
  // ...
}

Cheap insurance against the workaround outliving its cause.

Observation 2 (optional) — no escape hatch for power users

resolveRenderWorkerCount clamps to 1 unconditionally when htmlInCanvas is detected, even when the user passes --workers 8 explicitly. The warning message tells the user what happened, but there's no way to override.

For the common case this is correct — preventing users from foot-gunning with a flickering output is good UX. But:

A developer testing whether the bug is fixed in a newer Chrome can't compare multi-worker output without a code change.
A user who's confident their composition is safe (maybe they're only animating, not capturing) has no opt-out.

An optional --force-workers N flag (distinct from --workers N) would be the clean shape — explicit acknowledgment that the user is overriding the safety check. Or --workers 8 --html-in-canvas-unsafe. Not blocking; the current shape is defensible.

Sanity checks (no concerns)

Performance cost on the clamped subset: workers go from auto-sized (8 typically?) to 1 → ~8x slowdown for html-in-canvas renders. Trade-off is right given the alternative is unusable flickering output. Worth noting in the PR body that the cost is bounded to the html-in-canvas subset of compositions, not all renders.
canvas[layoutsubtree] selector matches any value of the attribute — correct, since the spec uses it as a boolean attribute (presence = on).
drawElementImage JS API without layoutsubtree: per the experimental spec, drawElementImage requires the canvas to have layoutsubtree enabled — so detection on the attribute alone covers the JS-call surface too. ✓
CI: 20 success / 10 in_progress / 0 failures.

Pre-merge ask (non-blocking)

Add a TODO + tracking-issue reference so this doesn't become permanent. The reference can point to Magi's investigation findings once those land, or a Chromium issue if we file one. The shape matters more than the exact link target.

Workaround quality is otherwise solid. Deferring to Magi on the root-cause question (Chrome experimental feature vs missing-determinism in our side).

— Rames Jusso

vanceingalls

Detection is clean and narrow — canvas[layoutsubtree] selector in detectRenderModeHints, propagated through the RenderModeHintCode union to an early-exit in resolveRenderWorkerCount. Tests cover the positive case, the negative case (plain canvas), the auto-sizing override, and the explicit --workers override with the warn log. The shape is consistent with how other render-mode hints (iframe, raf) are wired.

Strengths:

captureCost.ts:116–126: early-exit before the cost estimate path is clean and correct. Single-worker enforcement is total — no edge where the estimate path can slide under it.
htmlCompiler.test.ts:416–441: positive/negative detection tests are well-formed with realistic HTML fixtures.
Warn log on the --workers > 1 override is the right UX signal.

important — silent override on the auto-sized path

captureCost.ts:118: the warn fires only when requestedWorkers !== undefined && requestedWorkers > 1. An auto-sized render (the majority of production jobs, where requestedWorkers is undefined) is silently pinned to 1 worker with no log at all. Operators have no signal that single-worker mode was engaged unless they happen to check workerCount downstream. The warn should also fire (or a separate log.info) on the auto path so the behavior is observable. Suggested shape:

log.warn(
  "[Render] html-in-canvas (drawElementImage) detected — pinning to 1 worker (Chrome concurrency limitation).",
  { requestedWorkers },
);
return 1;

Single branch, no condition needed.

important — htmlInCanvas implicitly forces screenshot mode; test doesn't pin this

htmlCompiler.ts:130: detectRenderModeHints returns recommendScreenshot: reasons.length > 0. Adding htmlInCanvas to reasons means every html-in-canvas composition is routed to screenshot capture mode via applyRenderModeHints (shared.ts:212), in addition to being pinned to 1 worker. The PR body doesn't mention this side effect, and the new test at htmlCompiler.test.ts:416 only asserts reasons — it does not assert recommendScreenshot. If this is intentional (screenshot + single-worker both apply), the test should pin it. If it's unintentional, htmlInCanvas needs a carve-out in detectRenderModeHints analogous to how the code already avoids double-applying hints in other places.

Manual testing passed, so at minimum screenshot mode is compatible with drawElementImage — but the coupling should be explicit in the test.

important — no escape hatch; no follow-up issue

Single-worker enforcement cannot be bypassed. If the workaround misfires on a composition that doesn't actually flicker, or if the root-cause fix lands separately and the detection is over-broad, there is no --force-workers override or env var that unblocks the operator. Consider a HYPERFRAMES_OVERRIDE_WORKERS env or a cfg.disableHtmlInCanvasWorkerPin flag that lets the team A/B without a code change.

Also: the PR body describes this as a fix for non-determinism but doesn't reference a follow-up issue to investigate the actual root cause in Chrome's drawElementImage implementation. A linked issue would close the loop on the workaround framing.

nit — wasted calibration on auto-sized html-in-canvas renders

renderOrchestrator.ts:1702: the calibration run is gated on job.config.workers === undefined && totalFrames >= 60. For auto-sized renders of html-in-canvas compositions, calibration will still execute (potentially 600ms–3s), then resolveRenderWorkerCount throws the estimate away and returns 1. The early-exit in resolveRenderWorkerCount doesn't propagate up to skip the calibration. Not a correctness issue, but a latency cost on every affected render. A pre-check before the calibration block using the hint codes would skip it:

if (job.config.workers === undefined && totalFrames >= 60 && !compiled.renderModeHints.reasons.some(r => r.code === "htmlInCanvas")) {
  // run calibration
}

CI note: all required checks were pending at review time; no failures.

Verdict: APPROVE
Reasoning: The core mechanism is correct — single-worker enforcement is total, detection is precise, and manual testing on the affected composition type passed. The important findings are observability and documentation gaps, not correctness breaks. Ship with follow-up for the escape hatch and the recommendScreenshot coupling test.

— Vai

miguel-heygen

Review: Magi — root-cause investigation + code review

Root-cause investigation (Miguel's question)

Verdict: Chrome's drawElementImage IS the root cause. Not a HyperFrames bug.

drawElementImage() is an experimental API behind --enable-features=CanvasDrawElement that rasterizes live DOM subtrees into a <canvas>. The non-determinism in parallel rendering comes from two Chrome-internal mechanisms:

Paint cache race — drawElementImage reads from the compositor's internal paint records. When multiple Chrome instances run on the same machine, CPU/GPU contention causes the compositor to serve stale or incomplete paint records to the canvas rasterizer. This produces flickering artifacts (frames where the canvas content is partially rendered or shows the previous frame's state).
SwiftShader contention — Both distributed and in-process renders run browserGpuMode: "software" (SwiftShader). Multiple SwiftShader instances contend on shared OpenGL state, and drawElementImage's rasterizer shares that pipeline. A single Chrome instance avoids this contention entirely.

The API is behind an experimental flag for a reason — it isn't stable for concurrent use. There's no workaround on our side short of waiting for Chromium to stabilize it.

Distributed rendering is unaffected — renderChunk.ts already hardcodes workerCount: 1 (line 544), so each Temporal activity runs a single Chrome instance per chunk on separate processes/machines, avoiding the shared-state problem.

Code review

Detection is correct: canvas[layoutsubtree] attribute check runs on the flattened HTML (after inlineSubCompositions), so sub-composition usage is caught. The resolveRenderWorkerCount short-circuit is clean.

Non-blocking (1): recommendScreenshot side effect — the htmlInCanvas reason being pushed into reasons also forces screenshot capture mode (recommendScreenshot: reasons.length > 0). This is actually correct and necessary: the micro-screenshot paint-force protocol in frameCapture.ts:740 only runs when captureMode !== "beginframe", and drawElementImage depends on that forced paint. But the test doesn't assert recommendScreenshot is true, so this critical coupling is invisible. Suggest adding expect(result.recommendScreenshot).toBe(true) to pin this.

Non-blocking (2): Agree with Rames on the tracking TODO. A TODO(htmlInCanvas): remove when Chromium stabilizes CanvasDrawElement for concurrent use would prevent this from becoming permanent cruft.

LGTM ✓

— Magi

jrusso1020

Following Magi's root-cause finding + acknowledging three Vai catches I missed

Magi's root-cause conclusion changes my framing

My prior review framed this as a "workaround that might become permanent cruft." Magi's investigation shifts that:

"drawElementImage is an experimental API behind --enable-features=CanvasDrawElement ... two Chrome-internal mechanisms ... There's no workaround on our side short of waiting for Chromium to stabilize it."

So the clamp IS the correct fix given the upstream API's design, not a temporary patch over our own bug. My "TODO + tracking issue" ask still applies, but the framing should shift from "remove when fixed" to "document the two mechanisms (paint cache race + SwiftShader contention) so a future maintainer understands why this clamp is permanent until Chromium stabilizes CanvasDrawElement for concurrent use."

Three observations I missed, verified independently at the code

Silent override on auto-sized renders (Vai #1). Verified at captureCost.ts:118-126: the warn fires only when requestedWorkers !== undefined && requestedWorkers > 1. Auto-sized jobs (where requestedWorkers === undefined — the majority of prod renders) take the silent return-1 path. Operators have no observability that single-worker mode engaged. Real gap; Vai's suggested fix (move warn out of the explicit-workers guard, fire on every htmlInCanvas detection) is the right shape.
htmlInCanvas implicitly forces screenshot mode, untested (Vai #2 / Magi non-blocking #1). Verified at htmlCompiler.ts: recommendScreenshot: reasons.length > 0. Adding htmlInCanvas to reasons flips recommendScreenshot to true. Per Magi's deeper note, this is actually correct and necessary — the micro-screenshot paint-force protocol in frameCapture.ts:740 only runs when captureMode !== "beginframe", and drawElementImage depends on that forced paint. So the coupling is load-bearing, not incidental. The test should pin it with expect(result.recommendScreenshot).toBe(true).
Calibration waste on auto-sized htmlInCanvas renders (Vai nit). Verified at renderOrchestrator.ts: if (job.config.workers === undefined && totalFrames >= 60) runs runCaptureCalibration without checking the hint codes. For html-in-canvas auto-sized renders, calibration runs (600ms-3s) and then resolveRenderWorkerCount immediately throws the estimate away by early-exiting to 1. The hint-code check Vai suggests would skip the calibration upfront.

Why I missed these

I focused review effort on "is the surgical scope right" + "is the workaround quality good" per the briefing, but didn't audit the observability surface (auto-vs-explicit log-fire branches) or trace the side effects of adding to reasons (the recommendScreenshot coupling). Both are exactly the kind of "check the second-order effects of the data structure change" lens I should apply on PRs that thread state through shared types like RenderModeHints.

Verdict: APPROVE (consensus with Vai + Magi)

Magi confirmed the clamp is the correct fix; Vai approved with observability + test-pinning asks; my review aligned on surgical scope + missed the observability gaps. All three reviews converge on APPROVE-on-merit with the same set of non-blocking follow-ups (auto-path log, test pinning recommendScreenshot, escape hatch, tracking comment with Magi's two mechanisms documented).

Holding for James's stamp greenlight.

— Rames Jusso

- Always log when html-in-canvas pins to 1 worker, not just on explicit --workers override (Vance, James) - Assert recommendScreenshot is true for htmlInCanvas detection — pins the load-bearing coupling with the screenshot paint-force protocol (Vance, Magi) - Skip capture calibration for htmlInCanvas auto-sized renders to avoid wasting 600ms–3s on an estimate that gets thrown away (Vance) - Add TODO documenting Chrome's two root-cause mechanisms: paint cache race and SwiftShader contention (James, Magi)

miguel-heygen force-pushed the fix/htmlincanvas-single-worker branch from 4fd03e0 to 497f220 Compare May 23, 2026 19:48

jrusso1020 reviewed May 23, 2026

View reviewed changes

vanceingalls approved these changes May 23, 2026

View reviewed changes

miguel-heygen commented May 23, 2026

View reviewed changes

jrusso1020 approved these changes May 23, 2026

View reviewed changes

miguel-heygen merged commit 0845dc9 into main May 23, 2026
41 checks passed

miguel-heygen deleted the fix/htmlincanvas-single-worker branch May 23, 2026 20:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(producer): force single worker for html-in-canvas compositions#1052

fix(producer): force single worker for html-in-canvas compositions#1052
miguel-heygen merged 2 commits into
mainfrom
fix/htmlincanvas-single-worker

miguel-heygen commented May 23, 2026 •

edited

Loading

Uh oh!

jrusso1020 left a comment

Uh oh!

vanceingalls left a comment

Uh oh!

miguel-heygen left a comment

Uh oh!

jrusso1020 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

miguel-heygen commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

jrusso1020 left a comment

Choose a reason for hiding this comment

Verdict: COMMENT (would-be APPROVE)

Surgical scope is right ✓

Sub-composition scope confirmed ✓

Test coverage solid ✓

Observation 1 (worth addressing) — no TODO / tracking-issue link

Observation 2 (optional) — no escape hatch for power users

Sanity checks (no concerns)

Pre-merge ask (non-blocking)

Uh oh!

vanceingalls left a comment

Choose a reason for hiding this comment

Uh oh!

miguel-heygen left a comment

Choose a reason for hiding this comment

Review: Magi — root-cause investigation + code review

Root-cause investigation (Miguel's question)

Code review

Uh oh!

jrusso1020 left a comment

Choose a reason for hiding this comment

Following Magi's root-cause finding + acknowledging three Vai catches I missed

Magi's root-cause conclusion changes my framing

Three observations I missed, verified independently at the code

Why I missed these

Verdict: APPROVE (consensus with Vai + Magi)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

miguel-heygen commented May 23, 2026 •

edited

Loading