Skip to content

fix hybrid model subblock param counting: all FFN sizes reported identical params#1258

Merged
kevalmorabia97 merged 5 commits into
feature/puzzletronfrom
jrausch/nemotron-h-fix-pattern-truncation
Apr 15, 2026
Merged

fix hybrid model subblock param counting: all FFN sizes reported identical params#1258
kevalmorabia97 merged 5 commits into
feature/puzzletronfrom
jrausch/nemotron-h-fix-pattern-truncation

Conversation

@j-rausch
Copy link
Copy Markdown
Contributor

@j-rausch j-rausch commented Apr 14, 2026

Summary

  • on hybrid models (e.g. Nemotron-H), calculate_subblock_params builds a 1-layer model to count per-layer params by setting num_hidden_layers=1
  • it left hybrid_override_pattern at full length, so the 1-layer model always built layer 0 (pattern[0] = Mamba). every FFN variant reported the same Mamba param count regardless of intermediate_size
  • this made MIP unable to differentiate FFN sizes

Fix

  • truncate hybrid_override_pattern to the single character matching the subblock being measured before instantiating the 1-layer model
  • per iteration/layer, we deep copy the model config and create a per-layer model config with fixed pattern
  • activates only when hybrid_override_pattern is present; non-hybrid models (Llama, Qwen, etc.) are unaffected

Summary by CodeRabbit

  • New Features

    • Per-subblock pattern truncation applied when computing subblock params to ensure correct per-layer selection.
  • Bug Fixes

    • Improved accuracy of parameter counting for hybrid FFN configurations, including validation of empty/invalid patterns.
    • Updated regression baselines for Nemotron teacher memory and parameter counts.
  • Documentation

    • Clarified docstring to note callers must adjust per-layer config before subblock calculations; renumbered inline pipeline step comments.
  • Tests

    • Added unit tests for pattern truncation and GPU validation tests for Nemotron-H parameter calculations.

… calculate_subblock_params reporting identical params for all FFN sizes on hybrid models

Signed-off-by: jrausch <jrausch@nvidia.com>
Signed-off-by: jrausch <jrausch@nvidia.com>
@j-rausch j-rausch requested a review from a team as a code owner April 14, 2026 15:45
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 14, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: ea06beb4-9edb-46eb-aeac-f6c41d251d00

📥 Commits

Reviewing files that changed from the base of the PR and between de38282 and 9afd4a7.

📒 Files selected for processing (2)
  • modelopt/torch/puzzletron/entrypoint.py
  • tests/gpu/torch/puzzletron/test_puzzletron.py
✅ Files skipped from review due to trivial changes (2)
  • modelopt/torch/puzzletron/entrypoint.py
  • tests/gpu/torch/puzzletron/test_puzzletron.py

📝 Walkthrough

Walkthrough

Adds ModelDescriptor.truncate_pattern_for_subblock to normalize and select a single-character hybrid override per layer, applies it to deep-copied per-subblock model configs during subblock stats computation, and adds unit and GPU tests validating truncation and FFN parameter counting.

Changes

Cohort / File(s) Summary
Core descriptor
modelopt/torch/puzzletron/anymodel/model_descriptor/base.py
Added ModelDescriptor.truncate_pattern_for_subblock(lm_config, parent_layer_index=None) that strips `
Subblock stats usage
modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
Per-subblock: deep-copy model_config, call truncate_pattern_for_subblock on the copied LM config (using the parent layer index), and use the truncated copy for memory/params/active-param calculations.
Docstring update
modelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.py
Docstring clarified: callers must pre-adjust per-layer config fields (e.g., hybrid_override_pattern) before calling; references ModelDescriptor.truncate_pattern_for_subblock. No runtime behavior changed.
Unit tests
tests/unit/torch/puzzletron/test_hybrid_pattern_truncation.py
New tests covering normal selection by index, stripping `
GPU validation test
tests/gpu/puzzletron/test_nemotron_h_gpu_validation.py
New GPU test that loads Nemotron-H config, extracts FFN indices from hybrid_override_pattern, truncates per-FFN subblock, computes FFN parameter counts for three intermediate_size variants, and asserts the three counts differ.
Regression baseline update
tests/gpu/torch/puzzletron/test_puzzletron.py
Adjusted expected teacher memory and parameter baselines for two Nemotron HF model IDs; no logic changes.
Entrypoint comments
modelopt/torch/puzzletron/entrypoint.py
Renumbered inline step comments in puzzletron(); no functional changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.47% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main fix: resolving a bug where hybrid model subblock parameter counting was reporting identical parameters for all FFN sizes due to improper pattern truncation.
Security Anti-Patterns ✅ Passed No security anti-patterns detected: torch.load/numpy.load do not use dangerous parameters, trust_remote_code is dynamically determined, no eval/exec on untrusted input, no # nosec bypass comments, no new risky dependencies.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jrausch/nemotron-h-fix-pattern-truncation

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: jrausch <jrausch@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 14, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-15 17:18 UTC

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/puzzletron/anymodel/model_descriptor/base.py`:
- Around line 193-199: The code currently strips pipe separators with pattern =
pattern.replace("|", "") then indexes pattern[0], which raises IndexError if the
result is empty (e.g., "|||"); update the logic in the block handling
pattern/parent_layer_index to guard for an empty pattern after normalization:
after computing pattern = pattern.replace("|", ""), check if pattern is empty
and in that case set lm_config.hybrid_override_pattern to an appropriate safe
value (e.g., "" or None) and return (or leave unchanged), otherwise continue
with the existing parent_layer_index conditional and assignment to
lm_config.hybrid_override_pattern; reference the variables pattern,
parent_layer_index, and lm_config.hybrid_override_pattern when making the
change.

In `@tests/gpu/puzzletron/test_nemotron_h_gpu_validation.py`:
- Around line 40-43: The nemotron_config fixture currently hardcodes
trust_remote_code=True; change it to depend on the nemotron_descriptor fixture
and pass its requires_trust_remote_code() value into load_model_config so the
descriptor drives the trust decision. Specifically, update the nemotron_config
fixture signature to accept nemotron_descriptor and call
load_model_config(MODEL_ID,
trust_remote_code=nemotron_descriptor.requires_trust_remote_code()), keeping
MODEL_ID and load_model_config as-is.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9c6c7a62-1477-4e19-a077-f220983bcdb4

📥 Commits

Reviewing files that changed from the base of the PR and between 3f41819 and b279014.

📒 Files selected for processing (5)
  • modelopt/torch/puzzletron/anymodel/model_descriptor/base.py
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.py
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
  • tests/gpu/puzzletron/test_nemotron_h_gpu_validation.py
  • tests/unit/torch/puzzletron/test_hybrid_pattern_truncation.py

Comment thread modelopt/torch/puzzletron/anymodel/model_descriptor/base.py
Comment thread tests/gpu/puzzletron/test_nemotron_h_gpu_validation.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.25%. Comparing base (38d9522) to head (9afd4a7).
⚠️ Report is 24 commits behind head on feature/puzzletron.

Additional details and impacted files
@@                  Coverage Diff                   @@
##           feature/puzzletron    #1258      +/-   ##
======================================================
- Coverage               76.33%   76.25%   -0.08%     
======================================================
  Files                     454      454              
  Lines                   48025    48104      +79     
======================================================
+ Hits                    36660    36682      +22     
- Misses                  11365    11422      +57     
Flag Coverage Δ
examples 41.93% <20.00%> (-0.02%) ⬇️
gpu 59.36% <93.33%> (+<0.01%) ⬆️
unit 51.85% <80.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@kevalmorabia97 kevalmorabia97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. Otherwise LGTM

Comment thread tests/gpu/puzzletron/test_nemotron_h_gpu_validation.py Outdated
Comment thread modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py Outdated
Comment thread modelopt/torch/puzzletron/anymodel/model_descriptor/base.py Outdated
@kevalmorabia97 kevalmorabia97 enabled auto-merge (squash) April 15, 2026 10:14
@kevalmorabia97
Copy link
Copy Markdown
Collaborator

/ok to test de38282

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97
Copy link
Copy Markdown
Collaborator

/ok to test 9afd4a7

@kevalmorabia97 kevalmorabia97 merged commit 5e4c43e into feature/puzzletron Apr 15, 2026
44 of 45 checks passed
@kevalmorabia97 kevalmorabia97 deleted the jrausch/nemotron-h-fix-pattern-truncation branch April 15, 2026 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants