Skip to content

fix: WeakDataset / ShearYX2DIrregular round-trip via al.from_json #554

@Jammy2211

Description

@Jammy2211

Overview

al.output_to_json(obj=weak_dataset, ...) writes a WeakDataset successfully, but al.from_json(file_path=...) fails to read it back with TypeError: VectorYX2DIrregular.__init__() missing 1 required positional argument: 'values'. The bug blocks any downstream consumer wanting to load a serialised shear catalogue, and currently forces the autolens_workspace/scripts/weak/fit.py tutorial to rebuild the dataset inline rather than load it from disk. Surfaced during PR PyAutoLens #525 / workspace #188 (step 3 of the weak-lensing series).

Plan

  • Add a values property to aa.VectorYX2DIrregular in PyAutoArray returning the underlying (N, 2) ndarray, mirroring the pattern already used by Grid2DIrregular.values and ArrayIrregular.values. This is the single root-cause fix — all consumers of to_dict / from_dict get it for free, including ShearYX2DIrregular (which subclasses VectorYX2DIrregular).
  • Add a PyAutoArray-side unit test that round-trips a VectorYX2DIrregular through output_to_json / from_json, since that's where the defective code lives and there are currently no dict round-trip tests for any irregular structure.
  • Add the WeakDataset regression test in PyAutoLens, exercising the full simulator → write → load path so the original failure mode is locked down.
  • Once the library PRs ship, simplify autolens_workspace/scripts/weak/fit.py to load the dataset via al.from_json instead of rebuilding it inline.
Detailed implementation plan

Affected Repositories

  • Jammy2211/PyAutoLens (primary — regression test + this issue)
  • Jammy2211/PyAutoArray (the actual fix)
  • Jammy2211/autolens_workspace (post-merge workspace follow-up)

Work Classification

Library work first (PyAutoArray + PyAutoLens), then workspace follow-up (autolens_workspace) after the library PRs ship.

Branch Survey

Repository Current Branch Dirty?
./PyAutoArray main clean
./PyAutoLens main clean
./autolens_workspace main dirty (pre-existing modified dataset/ binaries — unrelated to this task; not touched by this work)

Suggested branch: feature/weak-dataset-from-json
Worktree root: ~/Code/PyAutoLabs-wt/weak-dataset-from-json/ (created later by /start_library)
Conflict check: no other active task is holding any of these three repos.

Root cause (verified by live reproduction)

autoconf/dictable.py::instance_as_dict walks the constructor signature and emits getattr(obj, arg) for each arg that satisfies hasattr(obj, arg).

  • VectorYX2DIrregular.__init__(self, values, grid)
  • Grid2DIrregular defines values (→ self._array) — round-trips fine.
  • ArrayIrregular defines values (→ self.array) — round-trips fine.
  • VectorYX2DIrregular does NOT define a values property. So hasattr(vec, 'values') → False, only grid ends up in the serialized dict, and from_dict calls VectorYX2DIrregular(grid=…)TypeError: missing 1 required positional argument: 'values'.

Live reproduction confirmed:

  1. The exact TypeError from the report is reproduced from a minimal aa.VectorYX2DIrregular(values=…, grid=…).
  2. Monkey-patching VectorYX2DIrregular.values = property(lambda self: self.array) makes the full WeakDataset simulator → output_to_jsonfrom_json round-trip pass, with loaded.shear_yx coming back as ShearYX2DIrregular (subclass preserved).
  3. No dict() method exists on AbstractNDArray or AbstractVectorYX2D that would short-circuit the serialization path.

Implementation Steps

Step 1 — PyAutoArray fix. Edit autoarray/structures/vectors/irregular.py. Add the property directly under __array_finalize__, before slim:

@property
def values(self):
    """
    The raw underlying ndarray of (y, x) vector components, shape [total_vectors, 2].
    """
    return self.array

Mirrors Grid2DIrregular.values and ArrayIrregular.values.

Step 2 — PyAutoArray unit test. Add a tmp_path-based test in test_autoarray/structures/vectors/test_vectors_irregular.py that:

  • builds VectorYX2DIrregular(values=[[0.1, 0.2], [0.3, 0.4]], grid=[[0.0, 0.0], [1.0, 1.0]])
  • runs output_to_json(obj=vec, file_path=tmp_path / "vec.json")
  • runs loaded = from_json(file_path=tmp_path / "vec.json")
  • asserts isinstance(loaded, VectorYX2DIrregular) and that loaded.array and loaded.grid.array match the originals element-wise

Step 3 — PyAutoLens regression test. Add test_autolens/weak/test_dataset.py::test_weak_dataset_json_round_trip(tmp_path):

  • build a small WeakDataset via al.SimulatorShearYX(noise_sigma=0.3, seed=1).via_tracer_from(tracer=…, grid=<small aa.Grid2DIrregular>)
  • al.output_to_json(obj=dataset, file_path=tmp_path / "dataset.json")
  • loaded = al.from_json(file_path=tmp_path / "dataset.json")
  • assert loaded.name == dataset.name
  • assert np.allclose(loaded.shear_yx.array, dataset.shear_yx.array) and np.allclose(loaded.shear_yx.grid.array, dataset.shear_yx.grid.array)
  • assert np.allclose(loaded.noise_map.array, dataset.noise_map.array)
  • assert isinstance(loaded.shear_yx, ShearYX2DIrregular) (subclass preserved through round-trip)

Step 4 — Workspace follow-up (after library PRs ship). In autolens_workspace/scripts/weak/fit.py, replace the inline truth_lens / truth_tracer / simulator.via_tracer_random_positions_from(...) reconstruction (and the accompanying docstring block explaining the workaround) with:

dataset = al.from_json(file_path=dataset_path / "dataset.json")

Apply the same change to the matching notebook if one exists.

Key Files

  • PyAutoArray/autoarray/structures/vectors/irregular.py — add values property
  • PyAutoArray/test_autoarray/structures/vectors/test_vectors_irregular.py — round-trip unit test
  • PyAutoLens/test_autolens/weak/test_dataset.py — WeakDataset round-trip regression test
  • autolens_workspace/scripts/weak/fit.py — workspace follow-up (post-merge)

Out of scope

  • ShearYX2D / aa.VectorYX2D (regular-grid variants) — constructor also takes mask, would need separate Mask2D round-trip work, and no current code path serializes them.
  • Pytree registration of WeakDataset for JAX compatibility — separate effort.
  • Reworking the simulator output format — dataset.json is the right artifact; only the loader is broken.

Risk / blast radius

  • Adding a values property cannot collide with state — no self.values = setter exists anywhere in VectorYX2DIrregular or its bases.
  • Grid2DIrregular already has the same property and is used identically by the dictable machinery; same pattern, no surprises.

Original Prompt

Click to expand starting prompt

Fix al.from_json round-trip for WeakDataset / ShearYX2DIrregular.

The simulator script at @autolens_workspace/scripts/weak/simulator.py writes the simulated WeakDataset to
dataset/weak/simple/dataset.json via al.output_to_json, but al.from_json(file_path=...) cannot read it
back. The traceback is:

TypeError: VectorYX2DIrregular.__init__() missing 1 required positional argument: 'values'

at PyAutoFit/autofit/mapper/model_object.py:223 (return cls_(...)), called from
PyAutoConf/autoconf/dictable.py:316.

The bug surfaced when writing @autolens_workspace/scripts/weak/fit.py (PR PyAutoLens #525 / workspace #188,
step 3 of the weak-lensing series). That tutorial works around it by rebuilding the dataset inline using
SimulatorShearYX(seed=1) rather than loading from disk — but that's a real limitation: any downstream
consumer wanting to load a serialised shear catalogue is blocked.

Root cause hypothesis (inspect to confirm)

ShearYX2DIrregular inherits from aa.VectorYX2DIrregular (see @PyAutoGalaxy/autogalaxy/util/shear_field.py).
The from_dict round-trip for irregular vector fields likely needs the underlying values and grid arrays
keyed correctly in the serialised dict. The generic dictable mechanism in PyAutoConf probably doesn't
know about the values / grid constructor convention used by aa.VectorYX2DIrregular.

Suggested approach

  1. Look at how aa.Grid2DIrregular round-trips (it must, since dataset.positions is just a property on
    dataset.shear_yx.grid). If Grid2DIrregular has a custom __init_subclass__, to_dict, or from_dict
    hook, mirror that pattern for VectorYX2DIrregular and ShearYX2DIrregular.
  2. Alternatively: add a custom serializer for WeakDataset in @PyAutoLens/autolens/weak/dataset.py that
    handles its three components (shear_yx, noise_map, name) by their public types.
  3. Add a regression test under @PyAutoLens/test_autolens/weak/test_dataset.py:
    • build a WeakDataset via SimulatorShearYX
    • al.output_to_json(obj=dataset, file_path=tmp_path / "dataset.json")
    • loaded = al.from_json(file_path=tmp_path / "dataset.json")
    • assert loaded.shear_yx == dataset.shear_yx, loaded.noise_map == dataset.noise_map,
      loaded.name == dataset.name

Workspace impact

Once fixed, the @autolens_workspace/scripts/weak/fit.py tutorial can be simplified to load from disk
rather than rebuilding the dataset inline. That migration is a small workspace follow-up after the library
fix lands.

Out of scope

  • Pytree registration of WeakDataset for JAX compatibility — a separate effort.
  • Reworking the simulator output format — dataset.json is the right artifact; only the loader is broken.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions