Skip to content

Block interface: rework LM config, fine-grained initialization, lr_scale, peft#360

Merged
jlamypoirier merged 109 commits into
mainfrom
block_interface_fine_grained
Sep 18, 2025
Merged

Block interface: rework LM config, fine-grained initialization, lr_scale, peft#360
jlamypoirier merged 109 commits into
mainfrom
block_interface_fine_grained

Conversation

@jlamypoirier
Copy link
Copy Markdown
Collaborator

@jlamypoirier jlamypoirier commented Sep 3, 2025

✨ Description

Rework LM config:

  • Extract embedding and output layer configs.
  • Rename tie_word_embeddings -> output_layer.tied_weight
  • Position embeddings are now enabled through embeddings_layer.position_embeddings.enabled, always disabled by default independently of rotary embeddings.
  • Rename max_position_embeddings -> embeddings_layer.num_position_embeddings
  • Rename parallel_embeddings -> embeddings_layer.vocab_parallel

Rework initialization config:

  • Remove most ad-hoc initialization arguments (leftovers from Block interface: extract mixer and mlp config #359)
  • Add dynamic initialization config scheme so initialization may be arbitrarily configured.
  • Add optional initialization config to all parameters. If not set, the default set by the parent layer will be used, matching previous behaviour.
  • Mamba: remove dt_init, dt_scale as the same can be obtained through the new init config scheme. Replace dt_min, dt_max, dt_init_floor by the mamba_dt_bias initialization type with similar options.

Rework LR scales:

  • Add lr_scale option to all parameters and most layers.
  • LR scales combine multiplicatively, i.e. the actual LR scale for a given parameter is the multiplication of its lr scale and that of all its parent

Rework Peft (lora):

  • Add apply_peft option to linear layers. If true, peft will be enabled for that layer (ex. wrapped with lora), otherwise the layer will be treated as non-peft (ex. frozen or ignored). If let unset, the default set by the parent layer will be used instead . (False except for attn query and value.)
  • Remove transformer peft config, use peft config directly instead. (Was there to determine the peft layers, now handled in linear config)

Todo (next prs):

Copy link
Copy Markdown
Collaborator

@tscholak tscholak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jlamypoirier jlamypoirier marked this pull request as ready for review September 17, 2025 21:42
Base automatically changed from block_interface_mixer_mlp_config to main September 18, 2025 21:12
@jlamypoirier jlamypoirier merged commit d3fef01 into main Sep 18, 2025
2 checks passed
@jlamypoirier jlamypoirier deleted the block_interface_fine_grained branch September 18, 2025 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants