Skip to content

[bug] Inconsistent init_method_std in test_load_distributed_checkpoint_dp2 #88

@jlamypoirier

Description

@jlamypoirier

🐞 Describe the Bug

test_load_distributed_checkpoint_dp2 fails with:

E           ValueError: Config diff:
E             init_method_std_embed`: `0.022` != `0.0625`
E             transformer.init_method_std`: `0.022` != `0.0625`
E             transformer.init_method_std_attn_proj`: `0.011` != `0.03125`
E             transformer.init_method_std_mlp_2`: `0.011` != `0.03125`
E             transformer.init_method_std_mlp_1`: `0.022` != `0.0625`
E             transformer.init_method_std_qkv`: `0.022` != `0.0625`

Must be some inconsistency between the config creation/loading methods. This bug is completely harmless since we're loading an already initialized checkpoint but could be hiding a bigger problem.
Likely reason: non-architecture config validated before loading the pretrained architecture, so the wrong architecture is used to set the defaults.

🔄 Steps to Reproduce

Run the test

🎯 Expected Behavior

Tests pass

Metadata

Metadata

Assignees

Labels

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions