fix hybrid model subblock param counting: all FFN sizes reported identical params#1258
Conversation
… calculate_subblock_params reporting identical params for all FFN sizes on hybrid models Signed-off-by: jrausch <jrausch@nvidia.com>
Signed-off-by: jrausch <jrausch@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (2)
📝 WalkthroughWalkthroughAdds ModelDescriptor.truncate_pattern_for_subblock to normalize and select a single-character hybrid override per layer, applies it to deep-copied per-subblock model configs during subblock stats computation, and adds unit and GPU tests validating truncation and FFN parameter counting. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Signed-off-by: jrausch <jrausch@nvidia.com>
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/puzzletron/anymodel/model_descriptor/base.py`:
- Around line 193-199: The code currently strips pipe separators with pattern =
pattern.replace("|", "") then indexes pattern[0], which raises IndexError if the
result is empty (e.g., "|||"); update the logic in the block handling
pattern/parent_layer_index to guard for an empty pattern after normalization:
after computing pattern = pattern.replace("|", ""), check if pattern is empty
and in that case set lm_config.hybrid_override_pattern to an appropriate safe
value (e.g., "" or None) and return (or leave unchanged), otherwise continue
with the existing parent_layer_index conditional and assignment to
lm_config.hybrid_override_pattern; reference the variables pattern,
parent_layer_index, and lm_config.hybrid_override_pattern when making the
change.
In `@tests/gpu/puzzletron/test_nemotron_h_gpu_validation.py`:
- Around line 40-43: The nemotron_config fixture currently hardcodes
trust_remote_code=True; change it to depend on the nemotron_descriptor fixture
and pass its requires_trust_remote_code() value into load_model_config so the
descriptor drives the trust decision. Specifically, update the nemotron_config
fixture signature to accept nemotron_descriptor and call
load_model_config(MODEL_ID,
trust_remote_code=nemotron_descriptor.requires_trust_remote_code()), keeping
MODEL_ID and load_model_config as-is.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 9c6c7a62-1477-4e19-a077-f220983bcdb4
📒 Files selected for processing (5)
modelopt/torch/puzzletron/anymodel/model_descriptor/base.pymodelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.pymodelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.pytests/gpu/puzzletron/test_nemotron_h_gpu_validation.pytests/unit/torch/puzzletron/test_hybrid_pattern_truncation.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## feature/puzzletron #1258 +/- ##
======================================================
- Coverage 76.33% 76.25% -0.08%
======================================================
Files 454 454
Lines 48025 48104 +79
======================================================
+ Hits 36660 36682 +22
- Misses 11365 11422 +57
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
kevalmorabia97
left a comment
There was a problem hiding this comment.
Minor comments. Otherwise LGTM
Signed-off-by: jrausch <jrausch@nvidia.com>
|
/ok to test de38282 |
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
|
/ok to test 9afd4a7 |
Summary
calculate_subblock_paramsbuilds a 1-layer model to count per-layer params by settingnum_hidden_layers=1hybrid_override_patternat full length, so the 1-layer model always built layer 0 (pattern[0]= Mamba). every FFN variant reported the same Mamba param count regardless ofintermediate_sizeFix
hybrid_override_patternto the single character matching the subblock being measured before instantiating the 1-layer modelhybrid_override_patternis present; non-hybrid models (Llama, Qwen, etc.) are unaffectedSummary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests