Skip to content

perf: skip temporal accumulation when disabled#2

Open
tonyblu331 wants to merge 1 commit into
MarioAndF:mainfrom
tonyblu331:temporal-opt
Open

perf: skip temporal accumulation when disabled#2
tonyblu331 wants to merge 1 commit into
MarioAndF:mainfrom
tonyblu331:temporal-opt

Conversation

@tonyblu331
Copy link
Copy Markdown

@tonyblu331 tonyblu331 commented Apr 27, 2026

Continuing with the refinements of the N8AO shader, this is quite a solid base to build off, and a quick optimization that can bring a lot. This is a follow-up to the ongoing work on the N8AO WebGPU/TSL implementation.


Summary

Optimizes the pipeline by skipping unnecessary GPU work when temporal accumulation is disabled (the default).

When the user has not enabled temporal accumulation, the code was still clearing two accumulation targets to black each frame and running a full-screen accumulation pass that did nothing more than copying the blur output back to itself via an identity mix(). This removes that waste.


Why this matters

The default configuration has accumulate: false. Most users bump into this path with every frame. The accumulation pass only earns its keep when the camera is static and noise is being blended across successive frames. When it is off, the pass is dead work waiting to be cut.


Changes

src/N8AONode.ts -- Two gates:

  1. Target clearing only fires when accumulate is true.
  2. The accumulation render pass only fires when �ccumulate is true. When disabled, the composite node is pointed directly at the blur output (readTarget.texture).

src/math.test.ts -- Floating-point comparisons now use toBeCloseTo via a compact expectCloseTo helper. The previous exact-equality assertions were brittle across platforms and toolchain versions.


Benchmark

The following numbers come from a mock-renderer pipeline trace (benchmark.mjs) that records render pass and clear counts per frame. Each configuration was run for 1000 iterations.

Configuration Before (main) After (temporal-opt)
Default (accumulate=off, denoise=2) 5 passes, 2 clears 4 passes, 0 clears
High quality (accumulate=off, denoise=4) 7 passes, 2 clears 6 passes, 0 clears
Half-res (accumulate=off, denoise=2, halfRes) 6 passes, 2 clears 5 passes, 0 clears
Accumulation on (accumulate=on) 5 passes, 2 clears 5 passes, 2 clears (identical)

The accumulation-enabled path is unchanged. The savings appear in every other configuration, meaning the path that runs for the majority of users.

Rough GPU-operation equivalents at 1080p (half-res AO = 960x540):

  • 1 accumulation pass: ~518k fragment shader invocations
  • 2 clear operations: ~1M pixel writes
  • 2 extra texture samples in the accumulation shader

About 3.1 million GPU operations per frame at 1080p. At 4K the figure is roughly 10.4 million operations per frame.


Safety

  • Pixel-identical. When accumulate is off, the accumulation shader was computing mix(black, blur, 1.0) = blur. Now the composite reads blur directly.
  • Runtime toggle works. The configuration proxy triggers firstFrame when accumulate changes, and the next updateBefore picks up the new value.
  • Shader unchanged. compositeAoTextureNode is a dynamic texture node. Changing its .value only changes which GPU texture is bound -- no recompilation.
  • All existing features preserved: halfRes, transparencyAware, denoiseIterations, camera motion.

Test results

All 4 tests pass across both the N8AO math utilities and the pipeline logic.

- Gate accumulation target clearing: only clear when accumulate=true
- Skip accumulation render pass when accumulate=false (default)
- Point composite directly to blur output when accumulate disabled
- Fix floating point precision tests with toBeCloseTo()

Saves 1 render pass + 2 clears per frame (~2M-8M pixel ops)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant