Make HOURS_VALUES a host array to avoid import-time GPU preallocation by hmgaudecker · Pull Request #13 · OpenSourceEconomics/aca-model

hmgaudecker · 2026-06-21T14:22:56Z

HOURS_VALUES = jnp.array(...) was a module-level JAX array: importing the model materialized it on the default device, and under XLA_PYTHON_CLIENT_PREALLOCATE=true that first array op reserved 95% of device 0 in every importing process — including the MSM estimation's pytask orchestrator, which only sruns GPU ranks and must leave the devices free. The orchestrator thus starved the rank's pool reservation (the device-0 OOM). Made it a host (NumPy) array, converted to JAX at the indexing sites. No numerical change.

🤖 Generated with Claude Code

The module-level `HOURS_VALUES = jnp.array(...)` materialized on the default device at import. With XLA_PYTHON_CLIENT_PREALLOCATE=true that first array op reserves 95% of device 0 in every process that imports the model — including the MSM estimation's pytask orchestrator, which only `srun`s GPU ranks and must leave the devices free for them. The orchestrator thus starved the rank's pool reservation, surfacing as the device-0 OOM. Make HOURS_VALUES a host (NumPy) array, converted to JAX at the indexing sites where the value folds into the surrounding compiled function. No numerical change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ery state The three leisure functions returned a raw `time_endowment - losses` with no floor, so once work costs reached the endowment leisure went to zero or negative and fed a non-positive base into the CRRA aggregator (NaN utility), and the kink made the MSM objective non-smooth for the derivative-free optimizer. A shared `_smooth_leisure_floor` helper applies a scaled softplus (`smoothing * logaddexp(0, available / smoothing)`, smoothing = 1% of the endowment) so leisure bends smoothly to 0+ instead. It reduces to `available` in the bulk, so existing estimates are preserved; the fixed-cost/reentry parameters stay identified when the optimizer drives them high. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The softplus leisure floor keeps leisure strictly positive in every state-action cell, so the `positive_leisure` feasibility constraint never binds. Remove it from the canwork retiree/nongroup/tied regime builders and delete the unused `positive_leisure` helper; feasibility is now carried by the smooth floor in the leisure functions themselves. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

BQSEGM solves one 1-D consumption/savings regime with at most one discrete action, so unlike DC-EGM it attaches per regime: solver="bqsegm" gives the M1 slice regime nongroup_nomc_inelig_canwork a BQSEGM config (budget node `resources`, post-decision `savings`, the savings form shared with DC-EGM) and leaves every other living regime on brute force. The three savings-form gates in _common accept "bqsegm" alongside "dcegm"; the HIS builders attach whichever EGM solver is present. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The BQSEGM case-piece envelope handles at most one discrete action, so the M1 vertical slice drops buy_private as a choice and binds it to BuyPrivate.yes in its consumers (premium, OOP) via the dags remove-and-fix convention. The brute and DC-EGM paths keep buy_private as an action. The drop_labor_supply hook is added for the next step (the pure-continuous slice also fixes labor_supply). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The pure-continuous M1 slice drops labor_supply as a discrete action and supplies it as a fixed full-time node (LaborSupply.h2000) read by labor income, AIME accrual, and the lagged-supply transition. lagged_labor_supply stays a state so the cross-regime continuation space is unchanged; the nomc regime's lifecycle transition is unaffected (it never gated on labor). With buy_private already fixed, the M1 regime now has no discrete action, so the only choice is continuous consumption against the cliffed budget. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…tax cliff The M1 vertical-slice regime `nongroup_nomc_inelig_canwork` carries `assets` (the Euler axis) alongside `aime` and the stochastic shock grids, so the BQSEGM config names `continuous_state="assets"`; the rest ride along. The progressive federal income tax is declared as a piecewise-affine schedule on `gross_income`, kinking at each finite bracket edge `income_tax_schedule.brackets_upper[ spousal_income, k]`, so BQSEGM differentiates the budget per declared bracket. The decorator is metadata-only — brute and DC-EGM solve identically. With these the full model constructs under `solver="bqsegm"` (M1 via BQSEGM, every other living regime on brute force). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

DC-EGM solves every living regime, so its solver-contract functions and the dropped borrowing constraint are broadcast model-wide. BQSEGM solves only the M1 regime, so broadcasting them forced every brute regime to supply the solver's `marginal_continuation` and stripped their borrowing constraint. Keep the model-level broadcast for DC-EGM only. The M1 regime carries the savings-form budget functions (`resources`, `savings`) at regime level and masks the borrowing constraint — BQSEGM enforces the borrowing limit through its savings grid's lower bound. BQSEGM inverts the Euler equation internally (CRRA from the utility parameters), so unlike DC-EGM it does not carry `inverse_marginal_utility`; a new `build_bqsegm_functions` supplies just the budget pair. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The BQSEGM continuation integrates the child's stochastic next-states over their joint node mesh; reading it in one pass scales the peak intermediate with the full ride-along x node x child-grid product, which on the M1 slice (aime=38 fixed PIA-bend grid x many ride-along axes) is enormous. Expose n_bqsegm_stochastic_node_batch_size on GridConfig and pass it into the solver, so a memory-rich device reads the whole mesh in one pass (0, default) while a tighter budget loops it in blocks. Resolves the M1 solve OOM together with the upstream pylcm splay threading. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

hmgaudecker mentioned this pull request Jun 21, 2026

Add target_batch_size to to_dataframe for sharded target eval OpenSourceEconomics/pylcm#389

Closed

hmgaudecker and others added 8 commits June 25, 2026 12:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make HOURS_VALUES a host array to avoid import-time GPU preallocation#13

Make HOURS_VALUES a host array to avoid import-time GPU preallocation#13
hmgaudecker wants to merge 9 commits into
mainfrom
fix/labor-market-host-hours-constant

hmgaudecker commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hmgaudecker commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant