Add pixelization imaging profiling: A100 + RTX 2060 + CPU sweep by Jammy2211 · Pull Request #57 · PyAutoLabs/autolens_workspace_developer

Jammy2211 · 2026-05-10T10:53:30Z

Summary

Adds long-term tracking artifacts for the rectangular pixelization imaging likelihood under jax_profiling/results/jit/imaging/pixelization/ — six configs side-by-side (CPU/GPU × fp64/mp on consumer hardware + A100 fp64/mp). Generated by new tooling in z_projects/profiling/scripts/ (separate local-only commit, no PR target).

Likelihood: Sersic + Isothermal + ExternalShear lens with a RectangularAdaptDensity(28, 28) source mesh + Constant regularization. Mirrors the canonical reference at jax_profiling/jit/imaging/pixelization.py. Companion to the MGE sweep merged in #56 — same harness, different model, an extra three steps (Overlay grid, Regularization matrix H, Regularized reconstruction) on top of the MGE 8-step pipeline.

Headline numbers

Config	Full pipeline	vmap per call
hpc_a100_fp64	9.7 ms	12.3 ms
hpc_a100_mp	10.1 ms	12.4 ms
local_gpu_fp64 (RTX 2060)	212.2 ms	233.1 ms
local_gpu_mp	192.6 ms	212.1 ms
local_cpu_fp64	2379.5 ms	2157.6 ms
local_cpu_mp	1670.1 ms	1878.5 ms

Key findings

A100 fp64 is 22× faster than RTX 2060 fp64 on the full single-JIT pipeline, and 245× faster than CPU. The production-vs-consumer gap is materially wider than for MGE (PR Add MGE imaging profiling: A100 + RTX 2060 + CPU sweep #56 measured 7.7× A100 vs RTX 2060). Pixelization's dense linear algebra benefits more from A100's tensor cores + memory bandwidth than MGE's smaller linear-LP solve does.
Bottleneck shifts dramatically across device classes.
- On CPU: Curvature matrix F (1317 ms) + Inversion setup (1228 ms) account for ~90% of step total.
- On RTX 2060: F (102 ms / 48%), Inversion setup (63 ms / 30%), Reconstruction NNLS (60 ms / 28%) — three near-equal contributors.
- On A100: F construction collapses to 0.53 ms (1/200th of CPU). NNLS reconstruction (6.8 ms) becomes ~70% of step total. Optimising NNLS is the next throughput lever for production hardware.
Mixed precision is a no-op on GPUs (within noise: A100 mp is ~4% slower than fp64; RTX 2060 mp is ~10% faster). On CPU mp gives ~30% full-pipeline speedup, mostly from F construction (1317 → 875 ms). The use_mixed_precision flag remains a CPU lever, not a GPU one — same conclusion as MGE.
vmap does not help pixelization (0.9–1.2× per call across every device class). Contrast with MGE which gets ~2× from vmap. Root cause is the inherently iterative NNLS solve in reconstruction_positive_only_from — it does not batch usefully. Batched pixelization evaluation needs a different reconstruction strategy.

Caveats

A100 JIT log-evidence shows fp32-level truncation (26232.3516 vs eager numpy reference 26232.0686). Same root cause as PR Add MGE imaging profiling: A100 + RTX 2060 + CPU sweep #56: the HPC PyAutoNSS venv does not have jax_enable_x64=True. Doesn't affect timing data here, and the assertion uses rtol=1e-2 for mp paths to absorb this. Worth confirming before quoting A100-served log Z values to high precision.
vmap regression vs single-JIT for pixelization is real, not measurement noise. It reproduces across all six configs and is consistent with NNLS being serial. The comparison.json headline section captures both numbers explicitly.
Local sweep timings vary across sessions due to JAX cache state and GPU thermal state. Cross-platform comparisons (A100 vs RTX 2060 vs CPU) are robust; single-machine cross-session deltas are not.
The chart (comparison.png) uses log scale on the y axis to make the A100 / RTX 2060 / CPU classes coexist legibly, since they span ~3 orders of magnitude.

Generated by

z_projects/profiling/scripts/pixelization_profile.py — single-config 11-step JIT profiler (per-step timings + full pipeline + vmap + memory analysis). Argparse-driven, honours PYAUTO_ROOT for worktree-aware canonical writes.
z_projects/profiling/scripts/pixelization_aggregate.py — --ingest-pre-fix /tmp (no-op unless artifacts present); --consolidate-from <staging> to move HPC pulls into this canonical dir; default to emit comparison.json + comparison.png.
z_projects/profiling/scripts/_setup_pixelization.py — shared build_dataset / build_model / build_analysis so the canonical reference's EXPECTED_LOG_EVIDENCE_HST = 26232.068573757562 constant carries through asserted on every run.
z_projects/profiling/hpc/batch_gpu/submit_pixelization_profile_{fp64,mp} — A100 SLURM submits.

The z_projects/profiling/ source side commits to its own (remote-less) main; only the result artifacts in this PR are version-tracked.

Test plan

All 6 JSON files schema-valid (parsed cleanly by pixelization_aggregate.py)
comparison.json + comparison.png regenerated end-to-end
No untracked or modified files in jax_profiling/results/jit/imaging/ outside the new pixelization/ subdir
Existing legacy per-version flat summary files (pixelization_likelihood_summary_hst_v*.{json,png}) untouched
Eager log_evidence regression assertion: 26232.068574 matches EXPECTED_LOG_EVIDENCE_HST (rtol=1e-4)
Full-pipeline JIT + vmap log_evidence assertions pass (rtol=1e-4 fp64, rtol=1e-2 mp)

🤖 Generated with Claude Code

…00 + RTX 2060 + CPU sweep Six configs side-by-side for the rectangular pixelization imaging likelihood (Sersic + Isothermal + ExternalShear lens with a RectangularAdaptDensity(28,28) source mesh + Constant regularization) covering consumer hardware (RTX 2060 Max-Q + i9-10885H), production A100, and both fp64 + mixed-precision variants. Generated by new tooling in z_projects/profiling/scripts/ (separate local-only commit; no PR target). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Jammy2211 merged commit 6a39009 into main May 10, 2026

This was referenced May 10, 2026

Add Delaunay imaging profiling: A100 + RTX 2060 sweep + three-way comparison #58

Merged

Re-profile rectangular + Delaunay at full production fiducial (1225/1231 src + MGE-60 lens) #60

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pixelization imaging profiling: A100 + RTX 2060 + CPU sweep#57

Add pixelization imaging profiling: A100 + RTX 2060 + CPU sweep#57
Jammy2211 merged 1 commit into
mainfrom
feature/pixelization-profiling-a100

Jammy2211 commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Jammy2211 commented May 10, 2026

Summary

Headline numbers

Key findings

Caveats

Generated by

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant