Block interface: parameter and linear config, separate SSM config.#358
Merged
Conversation
This was referenced Sep 3, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
✨ Description
Not really complete by itself, but extracted as a separate PR to limit PR scope.
New(ish) concepts:
ParameterConfig.get_parameter, and most layers (more to come) are created through[LayerConfig].get_layer. This ensures correct, standardized creation, and leave more room for new additions (ex. dynamic types.)Noneor with specialdefaultmarker), andget_parameter/get_layertake default values as arguments. This way we keep existing behaviour as default and make the new options truly optional and opt-in. (Otherwise, things like disabling biases, setting initialization scale or lr scale would have needed manual setting of every single parameter.)Main changes:
ParameterConfigas the new standard way to configure and instantiate (get_parameter) every parameter. Currently a placeholder config, but standard parameters (lr scale, initialization, maybe more) will be added in next PRs.OptionalParameterConfigfor weights that may be enabled or disabled (ex. biases). It comes with anenabledoption, with default provided by the parent layer.get_layer, which takes non-config arguments as well as defaults forbias.enabled(default_add_bias) and initialization (customizable initialization will come later).CausalConv1dlayer (based on Mamba 2 and Discrete Mamba 2 implementations) and config. Config is similar toAffineLinearConfig, but also supports custom activation, with default set by the parent layer.MambaConfig,Mamba2Config,DiscreteMamba2Config. Things are a bit awkward for now because of the couble configuration (hybrid_block_layout,ssm.type), but this will be addressed in upcoming PR.auto_grad_accumulationarguments, as things work without it, and removing it allows mixing auto and non-auto accumulation (dt bias).Config/breaking changes:
d_xbin a Mamba 1 layer will cause a crash.)type.ssm.add_bias_linear,AddLinearBiasChoices.add_linear_biases: boolis kept as the only global option for biases, at least for now. Other options may be achieved through individual layer configs.ssm.expansion_factorremoved (redundant)ssm.conv_kernel_dimension->ssm.convolution_layer.kernel_sizessm.activation_type->ssm.convolution_layer.activationconv1d_weight->convolution.weightconv1d_bias->convolution.biasdt_proj_weight->dt_proj.weightdt_proj_bias->dt_proj.biasdt_proj_bias->dt_proj.biasTODO: