Skip to content

Dkorzekwa/any model other models#1007

Merged
danielkorzekwa merged 75 commits into
feature/puzzletronfrom
dkorzekwa/any_model_other_models
Mar 17, 2026
Merged

Dkorzekwa/any model other models#1007
danielkorzekwa merged 75 commits into
feature/puzzletronfrom
dkorzekwa/any_model_other_models

Conversation

@danielkorzekwa
Copy link
Copy Markdown
Contributor

@danielkorzekwa danielkorzekwa commented Mar 9, 2026

What does this PR do?

Merging dkorzekwa/any_model_other_models into dkorzekwa/mip_and_realize_models - this MR is only for reviewing. Ultimately dkorzekwa/any_model_other_models should be merged into feature/puzzletron once dkorzekwa/mip_and_realize_models is merged there.

Summary by CodeRabbit

  • New Features

    • Added support for multiple model architectures: Mistral Small, Nemotron H, Nemotron H v2, Qwen2, Qwen3 8B, and Qwen3 VL 30B.
    • Introduced new pruning configurations and optimization pipelines for supported models.
    • Added comprehensive model descriptor framework enabling automated weight conversion and configuration handling.
    • Extended support for Mixture of Experts (MoE) models with expert removal pruning capabilities.
  • Tests

    • Enhanced test coverage with parametrized configurations for multiple model variants.

- Add converter, model_descriptor, puzzformer, and llama model support
- Selective merge of anymodel functionality

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…s merged)

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
@danielkorzekwa danielkorzekwa requested a review from a team as a code owner March 9, 2026 16:03
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
… if now test_puzzletron.py will be repeatable.

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we call this nemotron_h and not nemotron_h_v3? Do we know if this will be same for v4 as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the names are changing so fast, I added to TODO to unify it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is qwen3_8b and not qwen3? All other models have generic converter not specific to one specific variant

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A model descriptor can be specific, and sometimes within the same model family across different sizes could be differences, e.g., in how model weights are named, or structured. This one was only tested on qwen3 8B, therefore named this way for now.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment - why qwen3_vl_30b and not qwen3_vl?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because tested only for this particular model

Comment thread modelopt/torch/puzzletron/sewing_kit/core.py Outdated
Comment thread modelopt/torch/puzzletron/sewing_kit/core.py Outdated
Comment thread modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py Outdated
Comment thread modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py Outdated
Comment thread modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py Outdated
@danielkorzekwa danielkorzekwa force-pushed the dkorzekwa/any_model_other_models branch from 3866125 to 27866de Compare March 16, 2026 19:05
    # This prevents NaN values in uninitialized parameters (e.g., backbone.layers.1.mixer.gate.weight
    # in nemotron-3-nano-30b-a3b-base-bf16) that can occur with from_config on RTX GPU cards (not on H100)

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…reproducible on CI)

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…YAMLs (#1039)

### What does this PR do?

Type of change: New tests / Refactoring

Simplifies the puzzletron test infrastructure by:

1. **Removing `hf_configs/` folder** — HuggingFace configs are now
loaded on-the-fly via `AutoConfig.from_pretrained(hf_model_name)`
instead of from cached static files.

2. **Removing `HF_MODEL_CARD_NAMES` mapping** — HF model names (e.g.
`meta-llama/Llama-3.1-8B-Instruct`) are passed directly as test
parameters.

3. **Replacing hardcoded VL model check** with `hasattr(config,
"text_config") and hasattr(config, "vision_config")` for generic
detection.

4. **Unifying ~6k lines of near-identical YAML** into shared base
configs with per-model overrides:
- `validate_model_defaults.yaml`, `validate_solutions_defaults.yaml` —
shared validation params
- `pruning/pruning_defaults.yaml`, `pruning/ffn_pruning_base.yaml`,
`pruning/attn_pruning.yaml`, `pruning/hidden_dim_pruning.yaml` — shared
pruning bases
- Per-model dirs now follow HF model card paths
(`meta-llama/Llama-3.1-8B-Instruct/`) and contain only model-specific
overrides (e.g. just the `layer_descriptor._target_` class)

5. **Removing `hydra_config_subdir` parameter** from test parametrize —
config path is derived from `hf_model_name` directly.

6. **Removing unused `bypass:` entries** from all per-model main YAMLs.

### Usage

```python
# Test parametrize now uses HF model names directly:
("meta-llama/Llama-3.1-8B-Instruct", "llama", None, False),
```

### Testing

All 8 parametrized test cases in `test_puzzletron.py` pass:
- meta-llama/Llama-3.1-8B-Instruct
- meta-llama/Llama-3.2-3B-Instruct
- Qwen/Qwen2.5-7B-Instruct
- Qwen/Qwen3-8B
- Qwen/Qwen3-VL-30B-A3B-Instruct
- mistralai/Mistral-Small-24B-Instruct-2501
- nvidia/NVIDIA-Nemotron-Nano-12B-v2
- nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16

CI Job:
https://github.com/NVIDIA/Model-Optimizer/actions/runs/23087216443/job/67065820836

### Before your PR is "*Ready for review*"

- Is this change backward compatible?: N/A (test-only changes)
- If you copied code from any other source, did you follow IP policy in
[CONTRIBUTING.md](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md#-copying-code-from-other-sources)?:
N/A
- Did you write any new necessary tests?: N/A (refactoring existing
tests)
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
N/A

### Additional Information

Hydra packaging notes (non-obvious fixes required):
- Added `# @Package _global_` to all per-model main YAMLs — needed when
`config_name` contains path separators, otherwise Hydra nests all keys
under the org/model package
- Added `@_here_` to sub-defaults inside `pruning/` configs — prevents
Hydra from compounding the `pruning` package at each inheritance level
(`pruning` → `pruning.pruning` → `pruning.pruning.pruning`)
- Moved `hydra/hydra_logging=disabled` from YAML defaults list to
`overrides=` in `puzzletron.py` — the YAML override syntax broke with
nested config paths

---------

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Co-authored-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Comment thread .pre-commit-config.yaml Outdated
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 force-pushed the dkorzekwa/any_model_other_models branch from 5b3d97d to 73eb9a8 Compare March 17, 2026 10:08
@danielkorzekwa danielkorzekwa merged commit 1b42f0b into feature/puzzletron Mar 17, 2026
28 checks passed
@danielkorzekwa danielkorzekwa deleted the dkorzekwa/any_model_other_models branch March 17, 2026 10:56
kevalmorabia97 added a commit that referenced this pull request Apr 15, 2026
### What does this PR do?

Implement puzzletron compression algorithm based on Puzzle paper
(https://arxiv.org/abs/2411.19146)

<details>
<summary> Th list of reviewed and merged MRs that resulted in the
feature/puzzletron branch</summary>

Merging dkorzekwa/any_model to feature/puzzletron

[Add anymodel directories to feature/puzzletron by danielkorzekwa · Pull
Request #974 ·
NVIDIA/Model-Optimizer](#974)
- merged

[Draft: anymodel activation scoring by danielkorzekwa · Pull Request
#989 ·
NVIDIA/Model-Optimizer](#989)
- merged

[Draft: Merge anymodel pruning by danielkorzekwa · Pull Request #990 ·
NVIDIA/Model-Optimizer](#990)
- merged

[Draft: Merging anymodel:build_library_and_stats by danielkorzekwa ·
Pull Request #993 ·
NVIDIA/Model-Optimizer](#993)
- merged

[Dkorzekwa/any model calc one block scores by danielkorzekwa · Pull
Request #994 ·
NVIDIA/Model-Optimizer](#994)
- merged

[Draft: merge any_model: mip_and_realize_models by danielkorzekwa · Pull
Request #995 ·
NVIDIA/Model-Optimizer](#995)
- merged

[Dkorzekwa/any model other modeqls by danielkorztiekwa · Pull Request
#1007 ·
NVIDIA/Model-Optimizer](#1007)
- merged

PR to 1007: #1039 - merged

[Dkorzekwa/anymodel gptoss by danielkorzekwa · Pull Request #1020 ·
NVIDIA/Model-Optimizer](#1020)
- merged

[Merge any_model tutorial by danielkorzekwa · Pull Request #1035 ·
NVIDIA/Model-Optimizer](#1035)
- merged

[Merge mbridge distillation for any_model by danielkorzekwa · Pull
Request #1036 ·
NVIDIA/Model-Optimizer](#1036)
- merged

[MR branch for the remaining difference between dkorzekwa/any_model an…
by danielkorzekwa · Pull Request #1047 ·
NVIDIA/Model-Optimizer](#1047)
- merged

[Dkorzekwa/decilm hf code cleanup by danielkorzekwa · Pull Request #1071
·
NVIDIA/Model-Optimizer](#1071)
- merged

[Dkorzekwa/decilm hf code cleanup 2 by danielkorzekwa · Pull Request
#1073 ·
NVIDIA/Model-Optimizer](#1073)
- merged

[Dkorzekwa/anymodel subblock stats by danielkorzekwa · Pull Request
#1085 ·
NVIDIA/Model-Optimizer](#1085)
- merged

[Dkorzekwa/anymodel subblock stats nodecilm by danielkorzekwa · Pull
Request #1102 ·
NVIDIA/Model-Optimizer](#1102)
- merged

[Dkorzekwa/decilm cleanup post subblockstats by danielkorzekwa · Pull
Request #1103 ·
NVIDIA/Model-Optimizer](#1103)
- merged

[code clean up by danielkorzekwa · Pull Request #1110 ·
NVIDIA/Model-Optimizer](#1110)
- merged

Merging into main:

[Activation hooks redesign (reuse hooks component across both minitron
and puzzletron) by danielkorzekwa · Pull Request #1022 ·
NVIDIA/Model-Optimizer](#1022)
- merged

[Dkorzekwa/puzzletron use importance hooks from prune by danielkorzekwa
· Pull Request #1115 ·
NVIDIA/Model-Optimizer](#1115)
- merged

</details>

<!-- Details about the change. -->

### Usage

Puzzletron tutorial:

https://github.com/NVIDIA/Model-Optimizer/tree/feature/puzzletron/examples/puzzletron

### Testing
The main e2e test for compressing 9 models with Puzzletron:

https://github.com/NVIDIA/Model-Optimizer/blob/feature/puzzletron/tests/gpu/torch/puzzletron/test_puzzletron.py

2-gpu nightly tests: 

-
https://github.com/NVIDIA/Model-Optimizer/actions/runs/24468209205/job/71501061203
-
https://github.com/NVIDIA/Model-Optimizer/actions/runs/24470214159/job/71508152952

### Before your PR is "*Ready for review*"
- Is this change backward compatible?: ✅
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: ✅
- Did you write any new necessary tests?: ✅
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
✅



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added Puzzletron: end-to-end heterogeneous pruning & NAS workflow with
AnyModel support, example pipelines, deployment and evaluation
utilities, and tools for converting/pruning and exporting compressed
checkpoints.

* **Documentation**
* Comprehensive Puzzletron tutorials, model-specific guides, evaluator
instructions, example configs, and changelog entry.

* **Chores**
* CI/workflow updates (extras installation, longer GPU test timeout),
pre-commit hook exclusion updated, and CODEOWNERS entries added.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Signed-off-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Signed-off-by: jrausch <jrausch@nvidia.com>
Signed-off-by: root <root@pool0-00848.cm.cluster>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Co-authored-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com>
Co-authored-by: J Rausch <38429553+j-rausch@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants