-
Notifications
You must be signed in to change notification settings - Fork 124
Closed
Description
Hello, and thanks for sharing your code. I stumbled on your repo while looking for how to implement mup for mamba. It seems like you implemented mup without scaling any attn-like matrices. Does that mean that mup work withs mamba out of the box as long as the right initializations (from mup package for example) are implemented?
Thanks for your help.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels