Skip to content

[Enhance] Support internal metrics for gdn A_log and norm & vl model in general#1615

Merged
HAOCHENYE merged 2 commits intoInternLM:mainfrom
nil0x9:linty/add-gdn-metrics
Mar 29, 2026
Merged

[Enhance] Support internal metrics for gdn A_log and norm & vl model in general#1615
HAOCHENYE merged 2 commits intoInternLM:mainfrom
nil0x9:linty/add-gdn-metrics

Conversation

@nil0x9
Copy link
Copy Markdown
Collaborator

@nil0x9 nil0x9 commented Mar 23, 2026

Summary

This PR adds internal metrics monitoring support for Gated DeltaNet (GDN) architecture, specifically tracking min/max statistics for A_log parameters and FusedRMSNormGated modules. It also adds general support for vision-language models by properly extracting the language model component.

Implementation Details

New metrics fields:

  • Added weight_min and weight_max to track parameter value ranges across layers
  • Added monitor_gdn_stats configuration flag (default: None)

Key changes:

  1. calculate_module_weight_min_max method: Computes min/max values across module parameters with proper handling of:
    - DTensor (converts to local tensor before computation)
    - Distributed reduction across all ranks using dist.all_reduce
    - Both nn.Module and torch.Tensor inputs
  2. VL model support: The recorder now automatically extracts language_model from VL models if present, allowing the metrics system to work seamlessly with
    multi-modal architectures.
  3. GDN-specific monitoring: Automatically registers hooks for:
    - FusedRMSNormGated modules (tracks min/max of gating parameters)
    - A_log attributes (tracks min/max of log-space state transition parameters)

@nil0x9
Copy link
Copy Markdown
Collaborator Author

nil0x9 commented Mar 23, 2026

@claude add PR description

@nil0x9 nil0x9 force-pushed the linty/add-gdn-metrics branch from b3018e9 to bdf5e01 Compare March 23, 2026 11:09
@nil0x9 nil0x9 changed the title [Enhance] Support internal metrics for gdn A_log and norm [WIP][Enhance] Support internal metrics for gdn A_log and norm Mar 23, 2026
@nil0x9 nil0x9 force-pushed the linty/add-gdn-metrics branch from bdf5e01 to 86f31ce Compare March 25, 2026 09:39
@nil0x9 nil0x9 changed the title [WIP][Enhance] Support internal metrics for gdn A_log and norm [WIP][Enhance] Support internal metrics for gdn A_log and norm & vl model in general Mar 25, 2026
@nil0x9 nil0x9 force-pushed the linty/add-gdn-metrics branch from 86f31ce to ed0ed7c Compare March 25, 2026 09:44
@nil0x9 nil0x9 changed the title [WIP][Enhance] Support internal metrics for gdn A_log and norm & vl model in general [Enhance] Support internal metrics for gdn A_log and norm & vl model in general Mar 25, 2026
@nil0x9
Copy link
Copy Markdown
Collaborator Author

nil0x9 commented Mar 25, 2026

@claude review

def __init__(self, internal_metrics_cfg: InternalMetricsConfig, model: XTunerBaseModel):
self.internal_metrics_cfg = internal_metrics_cfg
self.model = model
self.model = model.language_model if hasattr(model, "language_model") else model
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: This change silently replaces self.model for all VL models, which means every other metric method (calculate_module_weight_rms, register_attn_output_hook, pop_metrics' named_modules() loop, MoE checks, etc.) now operates on language_model instead of the full composite model.

This is a behavioral change for all existing VL model users, not just GDN metrics. Consider whether this is intentional. If the goal is only to iterate over language-model modules for GDN stats, it would be safer to scope the unwrapping narrowly:

Suggested change
self.model = model.language_model if hasattr(model, "language_model") else model
self.model = model
self._language_model = model.language_model if hasattr(model, "language_model") else model

Then use self._language_model only in the GDN-specific iteration in pop_metrics.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 25, 2026

Claude: ## Review Summary

This PR adds internal metrics monitoring for GDN (Gated DeltaNet) A_log parameters and FusedRMSNormGated modules, plus VL model support via language_model unwrapping.

Issues

Critical

  • self.model reassignment (line 82): Replacing self.model with model.language_model for VL models changes behavior for all existing metrics (weight RMS, attention stats, MoE stats), not just the new GDN metrics. This is a silent behavioral change that could break or alter existing monitoring. Recommend scoping the unwrapping narrowly.

Warning

  • Top-level import of gated_deltanet (line 16): Will break internal_metrics.py import in environments without the fla dependency. Should be a conditional import.
  • Per-module all_reduce in calculate_module_weight_min_max: Two blocking collectives per monitored module could be expensive. Consider batching min/max across all modules and doing a single reduction.

Nit

  • Type annotation nn.Module | nn.Parameter | torch.Tensor is redundant — nn.Parameter is a torch.Tensor, so nn.Module | torch.Tensor suffices.
  • Inline DTensor import could be moved to module level since it's from standard PyTorch.
  • Missing PR description — would help reviewers understand the motivation and scope.

Verdict

REQUEST_CHANGES

@nil0x9 nil0x9 force-pushed the linty/add-gdn-metrics branch from 551d0b6 to 1817556 Compare March 25, 2026 12:03
@nil0x9 nil0x9 force-pushed the linty/add-gdn-metrics branch from 1817556 to 8c66dd4 Compare March 25, 2026 12:12
@HAOCHENYE HAOCHENYE merged commit baa4675 into InternLM:main Mar 29, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants