Skip to content

fix: expose row-group admission controls#745

Open
eric-tramel wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
eric-tramel:etramel/fix/741-row-group-admission
Open

fix: expose row-group admission controls#745
eric-tramel wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
eric-tramel:etramel/fix/741-row-group-admission

Conversation

@eric-tramel
Copy link
Copy Markdown
Contributor

@eric-tramel eric-tramel commented Jun 5, 2026

📋 Summary

Adds public RunConfig.row_group_admission controls so async runs can tune fixed/adaptive row-group admission and active-row guardrails without reaching into engine internals. The scheduler now reports the configured/adaptive row-group state in capacity plans, rejects unsafe oversized row groups when a row budget is active, and releases checkpointed completion state without retaining unbounded row/drop summaries.

🔗 Related Issue

Fixes #741

🔄 Changes

  • Adds RowGroupAdmissionConfig / RowGroupAdmissionMode, public caps, validation, and lazy config exports.
  • Threads row-group admission mode, horizon, adaptive initial target, and max admitted rows from RunConfig into the async scheduler.
  • Preserves the historical default fixed horizon (3) as row-group-count-only, while widened fixed and adaptive modes derive a capped active-row guard when one is not explicit.
  • Bounds post-checkpoint completion summaries, aggregates fragmented dropped-row summaries, caches remaining cell counts, and cleans stale dispatched/deferred state after checkpoint release.
  • Updates architecture/Fern docs, capacity contracts, and the Issue Public async runs cannot tune row-group horizon/adaptive admission #741 plan.

🧪 Testing

  • make test passes
  • Unit tests added/updated
  • E2E tests added/updated (covered by async builder integration tests and large mock-provider scheduler runs)

Additional validation:

  • make check-all-fix
  • NODE_OPTIONS=--experimental-global-webcrypto make check-fern-docs DOCS_PYTHON_VERSION=3.12 (fern check: 0 errors, 1 warning)
  • Full package tests: config 634 passed; engine 2273 passed; interface 955 passed, 1 skipped
  • Focused completion/retryable cleanup tests: 50 passed
  • Large mock/scale probes: first fragmented 1M-row release aggregates without sorting/copying (peak_mb=0.386), alternating released summaries stay capped at 4096, reopened released ranges stay capped, 1M-row completion polling remains cache-backed, and 64x16k fixed/adaptive mock scheduler runs retain bounded summaries.
  • Review-nuke fallback: configured persona spawning was unavailable in this environment, so I used the seven-lens read-only fallback and iterated until the final narrow passes were clean.

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

@eric-tramel eric-tramel requested a review from a team as a code owner June 5, 2026 20:56
@eric-tramel eric-tramel deployed to agentic-ci June 5, 2026 20:56 — with GitHub Actions Active
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Review: PR #745 — Expose row-group admission controls

Summary

This PR promotes the previously-internal row-group admission controls of AsyncTaskScheduler to a public RunConfig.row_group_admission policy (RowGroupAdmissionConfig + RowGroupAdmissionMode). It threads the public config through DatasetBuilder._prepare_async_run into the scheduler, applies the row-budget guard in both fixed and adaptive modes (previously adaptive-only), refactors a chunk of admission diagnostics emission, and adds matching architecture / Fern / docs/concepts/ documentation plus a plans/741/ design note.

The core API is small, well validated, and correctly preserves defaults: RunConfig() continues to behave like the prior fixed horizon of 3 row groups. Most of the volume is documentation and test coverage. The change is a clean, layered refactor — no import-direction violations and no leakage of engine internals into the config package.

Findings

Behavior change worth a release-note callout (medium)

_row_group_row_guard_allows previously short-circuited to True outside adaptive mode (async_scheduler.py old line: if not self._adaptive_row_group_admission: return True). It now applies unconditionally, with _max_admitted_rows derived as min(num_records, max(hard_cap * buffer_size, 8192)) when not explicit. The new test test_scheduler_derived_row_group_row_guard_blocks_extra_large_groups is parametrized over ["fixed", "adaptive"] and asserts the guard blocks 5,000-row groups under a derived guardrail of 8,192 — meaning fixed-mode runs that previously admitted oversized row groups will now be blocked.

In practice this is unlikely to bite typical users (buffer_size=1000, cap=3 gives a derived guard of 8,192 ≥ 3,000 worth of admitted rows, so the guard is a near no-op), but users running with a single row group materially larger than max(hard_cap * buffer_size, 8192) will see throttling they didn't before. The architecture doc describes the new guard, but neither the RunConfig.row_group_admission field docstring nor set_run_config mention that the default behavior in fixed mode tightened. Worth a one-line note in the architecture doc / changelog ("Fixed-mode runs now also enforce a derived max_admitted_rows guard; set max_admitted_rows explicitly to retain prior behavior or widen it").

_adaptive_row_group_block_reason lost its on-demand lookup (low)

Previously the function called _next_unadmitted_row_group_size() to scan self._row_groups for the next unadmitted group. Now the caller passes self._pending_row_group_admission_size, which is only set inside the _admit_row_groups loop while it's actively waiting for capacity (and cleared in finally). If _maybe_update_adaptive_row_group_target runs while _admit_row_groups is between iterations (after _dispatch_seeds, before reassignment), the value is stale-but-set; if _admit_row_groups has finished (_all_rgs_admitted) the early-return path covers it.

The cases are likely all benign because the actual admission loop independently re-evaluates the guard, but the new design ties an adaptive-target signal to the lifecycle of an unrelated coroutine variable. Two cheap mitigations: keep _next_unadmitted_row_group_size() (it's small), or document that _pending_row_group_admission_size is "current admission attempt, may be None between attempts" and accept no_pending_row_groups is a slightly noisier diagnostic. Not blocking.

max_admitted_rows or fallback (very minor)

async_scheduler.py:346:

self._max_admitted_rows = max_admitted_rows or self._max_admitted_rows_guardrail()

The constructor already validates max_admitted_rows >= 1, so or and is not None else are equivalent here. Prefer max_admitted_rows if max_admitted_rows is not None else self._max_admitted_rows_guardrail() for clarity — the or form silently falls through on a hypothetical 0 if the validator is ever loosened. Bikeshed.

Pydantic config — solid

  • RowGroupAdmissionConfig cleanly enforces adaptive_initial_target is rejected in fixed mode and bounded in adaptive mode.
  • RunConfig.validate_row_group_admission_budget correctly enforces max_admitted_rows >= buffer_size.
  • Lazy-import wiring in data_designer.config.__init__ matches the existing pattern (entry in both TYPE_CHECKING block and _LAZY_NAME_TO_MOD_OBJ).
  • _MAX_ROW_GROUP_ADMISSION_HORIZON = 64 and _MAX_EXPLICIT_ADMITTED_ROWS = 1_000_000 are reasonable safety caps; the values match the table in the docs.

Defaults preserve prior behavior — confirmed

  • RunConfig().row_group_admissionmode=fixed, max_concurrent_row_groups=3, matching the prior scheduler default (test_run_config_exposes_default_row_group_admission_policy, test_prepare_async_run_threads_row_group_admission_config[default_fixed]).
  • row_group_admission_source defaults to "default" in the scheduler signature; _prepare_async_run always passes "run_config" now, so the capacity-plan source flips from the old "dataset_builder" to "run_config" for production runs. Anything consuming that field on the public capacity plan should be aware. Worth checking if any telemetry consumer pins the literal "dataset_builder" for row-group-admission source.

Test coverage — strong

Six new config-level tests, one parametrized scheduler integration test (5 cases), two new async scheduler integration tests (fixed/adaptive parametrized), a default-row-guard scaling test, and an explicit-row-guard blocking test. Both fixed and adaptive paths are exercised end-to-end against MockSeedGenerator / SlowLLMBoundCellGenerator. The deletion of _next_unadmitted_row_group_size is implicitly covered by the adaptive-block-reason tests now passing next_size directly.

Style / structural compliance — clean

  • from __future__ import annotations: present in modified files.
  • Modern type syntax (int | None, list[str]).
  • Absolute imports only.
  • Import direction: dataset_builder.py correctly imports from data_designer.config.run_config; data_designer.interface.data_designer only updates a docstring. ✓
  • The _emit_row_group_admission_event extraction is a nice cleanup — three callsites collapsed, less drift between the event name and the diagnostics reason.

Doc parity

architecture/dataset-builders.md, docs/concepts/architecture-and-performance.md, and fern/versions/latest/pages/concepts/architecture-and-performance.mdx are all updated with consistent wording. Code snippet imports data_designer.config as dd and from data_designer.interface import DataDesigner — both supported.

Verdict

Approve with minor suggestions. The shape of the public API matches existing patterns in the codebase, defaults are preserved, validation is tight, and tests cover both modes. The two things worth addressing before merge:

  1. Note the fixed-mode row-guard tightening in the architecture doc / release notes (or relax the derived guard to remain a no-op for fixed mode unless explicitly set, if backward compatibility is intended).
  2. Either restore _next_unadmitted_row_group_size() or add a one-line comment explaining the staleness contract of _pending_row_group_admission_size in _adaptive_row_group_block_reason.

Neither is a blocker.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 5, 2026

Greptile Summary

This PR exposes RunConfig.row_group_admission as public API so async runs can tune fixed or adaptive row-group admission and active-row guardrails without reaching into scheduler internals. The config is threaded through DatasetBuilder._prepare_async_run into AsyncTaskScheduler, and the row-budget guard is now applied to both fixed-widened and adaptive horizons.

  • New config types: RowGroupAdmissionConfig / RowGroupAdmissionMode with Pydantic validation, lazy exports, and public caps (MAX_ROW_GROUP_ADMISSION_HORIZON, MAX_ROW_GROUP_ADMITTED_ROWS). The default fixed-3 behavior is preserved for existing callers.
  • Scheduler changes: Row guard is enforced uniformly (fixed or adaptive); _pending_row_group_admission_size replaces the O(n) _next_unadmitted_row_group_size scan; in_flight_count == 0 guards checkpoint to prevent premature row-group release; _dispatched is explicitly cleaned up for checkpointed row groups; DatasetGenerationError is now raised (rather than swallowed) on finalization failure.
  • CompletionTracker.release_row_group: Replaces per-row completion sets with memory-bounded range summaries after durable checkpoint, capping diagnostic state to MAX_EXACT_RELEASED_ROW_INTERVALS and MAX_RELEASED_ROW_GROUP_SUMMARY_RANGES.

Confidence Score: 5/5

Safe to merge. All changed execution paths are guarded by construction-time validation, the scheduler's asyncio concurrency invariants are preserved, and the default fixed-3 horizon behavior is unchanged for existing callers.

The row-group admission config is threaded cleanly from the public API through the builder into the scheduler. The _pending_row_group_admission_size try/finally pattern correctly prevents stale adaptive-target updates during the post-admit dispatch window. The in_flight_count == 0 checkpoint guard closes a race between task completion and row-group release. The CompletionTracker.release_row_group range-summary implementation handles all boundary conditions with thorough tests. No correctness issues were found in the changed paths.

No files require special attention. The most structurally complex change is completion.py's range-summary data structure, which is thoroughly covered by the new test suite.

Important Files Changed

Filename Overview
packages/data-designer-engine/src/data_designer/engine/dataset_builders/async_scheduler.py Threads row-group admission config through constructor; _pending_row_group_admission_size set/cleared in try/finally cleanly prevents stale adaptive-target updates; in_flight_count==0 guard on checkpoint is correct; _dispatched cleanup for checkpointed groups fixes the old skip-flag leak; DatasetGenerationError now propagated on finalization failure.
packages/data-designer-engine/src/data_designer/engine/dataset_builders/scheduling/completion.py Adds release_row_group with memory-bounded range-summary data structures; _batch_complete changed from set to dict (column→size) for correct size-aware completion checks; _remaining_cell_rows lazy-init cache is correctly updated on mark_cell_complete and drop_row with idempotency guards; range-merge algorithm and _trim_released_summary_ranges are correct.
packages/data-designer-config/src/data_designer/config/run_config.py Adds RowGroupAdmissionMode, RowGroupAdmissionConfig, and related constants; adds RunConfig.validate_row_group_admission_budget cross-field validator; all validation logic is correct and well-tested.
packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py Correctly translates RunConfig.row_group_admission into AsyncTaskScheduler kwargs; defensive ValueError for un-normalized adaptive_initial_target is safe since the Pydantic validator always normalizes it.
packages/data-designer-engine/src/data_designer/engine/capacity.py Adds max_admitted_rows_source field to RowGroupAdmission dataclass for telemetry; minimal change, correct.

Sequence Diagram

sequenceDiagram
    participant User
    participant RunConfig
    participant DatasetBuilder
    participant AsyncTaskScheduler
    participant CompletionTracker

    User->>RunConfig: "RunConfig(row_group_admission=RowGroupAdmissionConfig(...))"
    RunConfig->>RunConfig: validate_adaptive_settings() / validate_row_group_admission_budget()

    User->>DatasetBuilder: build() / run_async()
    DatasetBuilder->>DatasetBuilder: _prepare_async_run() translate RowGroupAdmissionConfig to kwargs
    DatasetBuilder->>AsyncTaskScheduler: __init__(max_concurrent_row_groups, adaptive_row_group_admission, max_admitted_rows, row_group_admission_source)
    AsyncTaskScheduler->>AsyncTaskScheduler: _validate_row_group_row_budget()

    AsyncTaskScheduler->>AsyncTaskScheduler: run() launch _admit_row_groups task

    loop For each row group
        AsyncTaskScheduler->>AsyncTaskScheduler: "_pending = rg_size"
        AsyncTaskScheduler->>AsyncTaskScheduler: _wait_for_row_group_admission_capacity(rg_size)
        AsyncTaskScheduler->>AsyncTaskScheduler: "_pending = None (finally)"
        AsyncTaskScheduler->>AsyncTaskScheduler: acquire rg_semaphore, dispatch seeds
    end

    loop Main dispatch loop
        AsyncTaskScheduler->>AsyncTaskScheduler: _maybe_update_adaptive_row_group_target() early-exit if _pending is None
        AsyncTaskScheduler->>AsyncTaskScheduler: "_checkpoint_completed_row_groups() guard in_flight_count==0"
        AsyncTaskScheduler->>CompletionTracker: release_row_group(rg_id, rg_size, all_columns)
        CompletionTracker->>CompletionTracker: compress to _ReleasedRows range summary
        AsyncTaskScheduler->>AsyncTaskScheduler: rg_semaphore.release() + row_group_admission_event.set()
    end

    AsyncTaskScheduler->>User: capacity_plan() with max_admitted_rows_source
Loading

Reviews (2): Last reviewed commit: "Expose row-group admission controls" | Re-trigger Greptile

Comment on lines +1001 to 1037
diagnostics = self._row_group_admission_diagnostics(reason=reason)
if extra:
diagnostics |= extra
self._emit_scheduler_event(event_kind, diagnostics=diagnostics)
self._emit_scheduler_health_snapshot(event_kind)

async def _admit_row_groups(self) -> None:
"""Admit row groups as semaphore slots become available."""
all_admitted = True
for rg_id, rg_size in self._row_groups:
await self._wait_for_row_group_admission_capacity(rg_size)
if self._early_shutdown or self._fatal_worker_error is not None:
all_admitted = False
break
await self._rg_semaphore.acquire()
if self._early_shutdown or self._fatal_worker_error is not None:
self._rg_semaphore.release()
all_admitted = False
break
if not self._row_group_row_guard_allows(rg_size):
self._rg_semaphore.release()
try:
for rg_id, rg_size in self._row_groups:
self._pending_row_group_admission_size = rg_size
await self._wait_for_row_group_admission_capacity(rg_size)
if self._early_shutdown or self._fatal_worker_error is not None:
all_admitted = False
break
await self._rg_semaphore.acquire()
if self._early_shutdown or self._fatal_worker_error is not None:
self._rg_semaphore.release()
all_admitted = False
break
self._rg_states[rg_id] = _RowGroupState(size=rg_size)
self._rg_states[rg_id] = _RowGroupState(size=rg_size)

if self._buffer_manager is not None:
self._buffer_manager.init_row_group(rg_id, rg_size)
if self._buffer_manager is not None:
self._buffer_manager.init_row_group(rg_id, rg_size)

await self._dispatch_seeds(rg_id, rg_size)
self._emit_scheduler_event(
"row_group_admitted",
diagnostics=self._row_group_admission_diagnostics(reason="admitted")
| {"row_group": rg_id, "row_group_size": rg_size},
)
self._emit_scheduler_health_snapshot("row_group_admitted")
self._wake_event.set()
await self._dispatch_seeds(rg_id, rg_size)
self._emit_row_group_admission_event(
"row_group_admitted",
reason="admitted",
extra={"row_group": rg_id, "row_group_size": rg_size},
)
self._wake_event.set()
finally:
self._pending_row_group_admission_size = None
self._all_rgs_admitted = all_admitted
self._wake_event.set()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Stale pending size in the post-admit window for adaptive mode

Between self._rg_states[rg_id] = _RowGroupState(size=rg_size) and the start of the next loop iteration (which sets _pending_row_group_admission_size = next_rg_size), _pending_row_group_admission_size still holds the size of the just-admitted row group. During await self._dispatch_seeds(...), worker coroutines from earlier row groups can call _maybe_update_adaptive_row_group_target, which forwards this value to _adaptive_row_group_block_reason. The row-guard check then evaluates admitted_rows + just_admitted_size, double-counting the admitted group's rows, and may return "max_admitted_rows" even when the actual next pending row group would fit.

The old _next_unadmitted_row_group_size() walk always returned the true next-unadmitted size. The new approach avoids the O(n) scan but introduces this brief stale window. For uniform buffer_size workloads (the common case) the effect is invisible, but for heterogeneous row-group sizes with a tight max_admitted_rows budget, the adaptive target can be held back for one update cycle longer than necessary.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer-engine/src/data_designer/engine/dataset_builders/async_scheduler.py
Line: 1001-1037

Comment:
**Stale pending size in the post-admit window for adaptive mode**

Between `self._rg_states[rg_id] = _RowGroupState(size=rg_size)` and the start of the next loop iteration (which sets `_pending_row_group_admission_size = next_rg_size`), `_pending_row_group_admission_size` still holds the size of the *just-admitted* row group. During `await self._dispatch_seeds(...)`, worker coroutines from earlier row groups can call `_maybe_update_adaptive_row_group_target`, which forwards this value to `_adaptive_row_group_block_reason`. The row-guard check then evaluates `admitted_rows + just_admitted_size`, double-counting the admitted group's rows, and may return `"max_admitted_rows"` even when the actual next pending row group would fit.

The old `_next_unadmitted_row_group_size()` walk always returned the true next-unadmitted size. The new approach avoids the O(n) scan but introduces this brief stale window. For uniform `buffer_size` workloads (the common case) the effect is invisible, but for heterogeneous row-group sizes with a tight `max_admitted_rows` budget, the adaptive target can be held back for one update cycle longer than necessary.

How can I resolve this? If you propose a fix, please make it concise.

Fixes NVIDIA-NeMo#741

Signed-off-by: Eric W. Tramel <1223539+eric-tramel@users.noreply.github.com>
@eric-tramel eric-tramel force-pushed the etramel/fix/741-row-group-admission branch from 1872d25 to abe044f Compare June 5, 2026 23:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Public async runs cannot tune row-group horizon/adaptive admission

1 participant