Skip to content

fix: raw-flux latents + soft-fail µJy when magzero missing #556

@Jammy2211

Description

@Jammy2211

Overview

The first first-class A100 search run from autolens_profiling (HPC
job 322548) converged in ~11 min, then search.fit() crashed inside
SearchUpdater._compute_latent_samples with
ValueError: magzero must be passed to the Analysis ... from
autolens/analysis/latent.py:_require_magzero. Any workspace that
enables total_*_flux_mujy in latent.yaml while building
AnalysisImaging without magzero= kills the post-fit metric write of
an otherwise-successful multi-hour search.

This task adds raw-flux latents (no instrument inputs needed,
default-on) and converts the existing µJy raise into a soft NaN +
one-time-per-process warning, so users still get the most useful flux
quantity by default and explicit µJy opt-ins can never crash a search.
The same fix is mirrored in PyAutoGalaxy
(total_galaxy_0_flux_mujy). Workspace guides on flux + latent
variables are updated to demo the new latents and show µJy conversion.

Plan

  • Add raw-flux latents (sum of image pixels, no magzero consumed)
    to PyAutoLens (total_lens_flux, total_lensed_source_flux,
    total_source_flux) and PyAutoGalaxy (total_galaxy_0_flux).
  • Default-enable the raw-flux keys in each library's config/latent.yaml;
    leave the existing _mujy keys default-off.
  • Replace the _require_magzero raise with a _maybe_magzero_warn
    helper that returns True (+ one warning per process per latent name)
    when magzero is None, and have the three _mujy latents early-return
    xp.nan in that case. Same change in PyAutoGalaxy.
  • Update autolens_workspace and autogalaxy_workspace guides
    (scripts/guides/units/flux.py + scripts/guides/results/latent_variables.py)
    to document the new default-on latents and show µJy conversion via
    ab_mag_via_flux_from / flux_mujy_via_ab_mag_from.
  • Do not touch euclid_strong_lens_modeling_pipeline — its
    config/latent.yaml keeps _mujy: true and works unchanged because
    magzero is always supplied there.
Detailed implementation plan

Affected Repositories

  • PyAutoLens (primary)
  • PyAutoGalaxy
  • autolens_workspace
  • autogalaxy_workspace

Work Classification

Both (libraries first, workspaces follow).

Library-first merge gate

  1. PyAutoGalaxy library PR merges to main.
  2. PyAutoLens library PR (depends on PyAutoGalaxy main).
  3. Workspace PRs (autolens_workspace + autogalaxy_workspace) after both libraries land.

Branch Survey

Repository Current Branch Dirty?
./PyAutoLens main clean — but held by weak-dataset-from-json worktree (PR #555); user opted to force-start a parallel worktree
./PyAutoGalaxy main clean
./autolens_workspace main ~150 modified files (binary leak from a previous smoke run; unrelated)
./autogalaxy_workspace main ~90 modified files (same pattern; unrelated)

Suggested branch: feature/flux-latents-raw (free in all four repos, locally and on origin)
Worktree root: ~/Code/PyAutoLabs-wt/flux-latents-raw/ (created by /start_library with PYAUTO_WT_FORCE=1 to bypass the PyAutoLens claim)

Implementation Steps

PyAutoGalaxy — autogalaxy/imaging/model/latent.py

  1. Add _MAGZERO_WARNED: set[str] = set() and a helper
    _maybe_magzero_warn(magzero, name) -> bool that returns True (+ one
    logger.warning(...) per name) when magzero is None, else False.
    Replaces the if magzero is None: raise ValueError(...) block at
    lines 60-66.
  2. In total_galaxy_0_flux_mujy, replace the raise with
    if _maybe_magzero_warn(magzero, "total_galaxy_0_flux_mujy"): return xp.nan.
  3. Add total_galaxy_0_flux(fit, magzero=None, xp=np) — sums
    fit.galaxy_image_dict[fit.tracer.galaxies[0]].array, returns
    xp.nan on (AttributeError, KeyError, IndexError). magzero
    accepted-but-ignored for uniform dispatcher context (same pattern as
    PyAutoLens's magnification).
  4. Register total_galaxy_0_flux in LATENT_FUNCTIONS.
  5. autogalaxy/config/latent.yaml: add total_galaxy_0_flux: true.
    Mirror in test_autogalaxy/config/latent.yaml.

PyAutoLens — autolens/analysis/latent.py

  1. Same _MAGZERO_WARNED + _maybe_magzero_warn helper, replacing
    _require_magzero at lines 31-37.
  2. Soft-fail the three _mujy latents (total_lens_flux_mujy,
    total_lensed_source_flux_mujy, total_source_flux_mujy): replace
    each _require_magzero(...) call with
    if _maybe_magzero_warn(...): return xp.nan.
  3. Add three new raw-flux latents (total_lens_flux,
    total_lensed_source_flux, total_source_flux), each
    magzero=None-accepting, summing the same image source as the µJy
    sibling, returning xp.nan on the same exception classes.
    total_source_flux reads from
    fit.tracer_linear_light_profiles_to_light_profiles to match its
    µJy sibling's linear-profile handling.
  4. Register all three in LATENT_FUNCTIONS.
  5. autolens/config/latent.yaml: set three new total_*_flux: true;
    keep _mujy keys at false. Mirror in test_autolens/config/latent.yaml.
    Update module docstring lines 9-15.

Tests — both libraries

  • Rewrite the existing "missing magzero raises" tests to assert NaN
    return + a caplog WARNING record matching the latent name.
  • Add a test_maybe_magzero_warn_logs_only_once_per_name test that
    discards the name from _MAGZERO_WARNED, calls the latent 3×,
    asserts exactly one record.
  • Add positive raw-flux tests using the existing fixture, asserting
    xp.sum(image) semantics and np.isnan on missing image dict.
  • Update test_latent_functions_registry_keys and
    test_analysis_imaging_latent_keys_property_reads_config for the
    new keys.

Workspace guides — Opus (science-teaching prose)

  • autolens_workspace/scripts/guides/units/flux.py: new section on the
    three raw-flux latents now in latent.csv by default; units = same
    as dataset.data.array; µJy conversion snippet using
    ab_mag_via_flux_from + flux_mujy_via_ab_mag_from from
    autogalaxy.imaging.model.latent; pointer to enabling _mujy
    latents via workspace config/latent.yaml + passing magzero= to
    AnalysisImaging (the Euclid pipeline pattern).
  • autolens_workspace/scripts/guides/results/latent_variables.py:
    short paragraph naming the three new defaults and their physical
    meaning.
  • autogalaxy_workspace/scripts/guides/units/flux.py: mirror, but for
    the single total_galaxy_0_flux key.
  • autogalaxy_workspace/scripts/guides/results/latent_variables.py:
    mirror.

Key Files

  • PyAutoGalaxy/autogalaxy/imaging/model/latent.py — helper + soft-fail + new latent + registry.
  • PyAutoGalaxy/autogalaxy/config/latent.yaml & test_autogalaxy/config/latent.yaml — enable new key.
  • PyAutoGalaxy/test_autogalaxy/imaging/model/test_latent.py — rewrite + new tests.
  • PyAutoLens/autolens/analysis/latent.py — helper + soft-fail × 3 + new latents × 3 + registry.
  • PyAutoLens/autolens/config/latent.yaml & test_autolens/config/latent.yaml — enable three new keys.
  • PyAutoLens/test_autolens/analysis/test_latent.py — rewrite + new tests.
  • autolens_workspace/scripts/guides/units/flux.py + …/results/latent_variables.py.
  • autogalaxy_workspace/scripts/guides/units/flux.py + …/results/latent_variables.py.

Verification

  1. python -m pytest test_autogalaxy/imaging/model/test_latent.py -v and
    python -m pytest test_autolens/analysis/test_latent.py -v.
  2. Full library suites: pytest test_autogalaxy/ -q and pytest test_autolens/ -q.
  3. /smoke_test on each workspace — confirm the two updated guide scripts run.
  4. End-to-end (optional): PYAUTO_TEST_MODE=1, enable both total_lens_flux: true
    and total_lens_flux_mujy: true, build AnalysisImaging without magzero,
    af.Nautilus(...).fit(...) — expect search completes, raw-flux columns finite,
    µJy columns NaN, exactly one WARNING per µJy name in the log.
  5. Euclid no-regression read of euclid_strong_lens_modeling_pipeline/config/latent.yaml
    to confirm magzero is still wired through and unchanged.

Original Prompt

Click to expand starting prompt

Flux-mujy latents crash search.fit() when magzero is missing from the Analysis

Context

Surfaced by the first first-class A100 search run from autolens_profiling
(searches/nautilus/imaging/mge.py, HPC job 322548). Nautilus converged
cleanly (log_Z=+31690.50, 65500 evals, 11m40s) but the post-fit latent
computation crashed inside SearchUpdater._compute_latent_samples:

ValueError: magzero must be passed to the Analysis via kwargs to compute
the 'total_lens_flux_mujy' latent. Disable it in config/latent.yaml or
pass magzero=<value>.

The raise lives at
PyAutoLens:autolens/analysis/latent.py:31-37:

def _require_magzero(magzero, name):
    if magzero is None:
        raise ValueError(
            f"magzero must be passed to the Analysis via kwargs to compute "
            f"the '{name}' latent. Disable it in config/latent.yaml or "
            f"pass magzero=<value>."
        )

and is the gate for three sibling latents:

  • total_lens_flux_mujy (line 40)
  • total_lensed_source_flux_mujy (line 60)
  • total_source_flux_mujy (line 77)

The bug

af.Nautilus(...).fit(model, analysis) does latent computation as part of
its standard update loop (via SearchUpdater._compute_latent_samples).
The set of latents to compute comes from config/latent.yaml on the
autoconf search path. A workspace that ships a latent.yaml enabling
total_lens_flux_mujy (e.g. autolens_profiling/config/latent.yaml, or
any workspace that opts into the full lensing latent catalogue) will
crash every search.fit() that constructs AnalysisImaging without
magzero=... — which is the standard path in every workspace tutorial
and SLaM pipeline today.

This is a default-config-crashes-by-default bug. The user has to know
to either:

  • Pass magzero=<value> on every AnalysisImaging(...) construction
    (and learn what value is appropriate for their instrument), or
  • Disable the offending latents in a project config/latent.yaml
    override, or
  • Set PYAUTO_SKIP_LATENTS=1 (the workaround applied in
    autolens_profiling/hpc/batch_gpu/submit_imaging_mge_a100_hst_fp64
    in PR Profiling / Optimization - to do list #30).

None of these are discoverable from the failure mode — the search runs
to convergence, then dies in post-fit bookkeeping with no metric output.

Desired fix

(see plan above — superseded by the agreed raw-flux + soft-fail design)

Out of scope

  • Auto-defaulting magzero to anything (0.0, instrument-specific):
    the flux value would be silently wrong rather than absent. NaN is
    the honest answer when the user hasn't supplied a zero-point.
  • Reworking the _require_* pattern across other PyAutoLens latents
    (e.g. cosmology-dependent ones). Scope this PR to the magzero family
    only.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions