[bug] `test_checkpoint` test not passing when any lr scale is set to 0

# 🐞 Describe the Bug
Whenever the lr scale of any component is set to 0, e.g. `model.base_model.transformer.mlp_lr_scale=0`, `test_checkpoint` is failing with:

```FAILED tests/test_checkpoint.py::test_load_pretrained_distributed_checkpoint - AssertionError: torch.Size([0]) != torch.Size([786432])```

I wonder how critical is this for loading/saving checkpoints that were trained with lr scaling? 
Maybe related to #256. 

# 🔄 Steps to Reproduce

Steps to reproduce the behavior:
add e.g. `model.base_model.transformer.mlp_lr_scale=0` [here](https://github.com/ServiceNow/Fast-LLM/blob/3ac976bdab51d44e775a9a7c2a30f4f18779c16a/tests/common.py#L155) and run `test_checkpoint`.

Same is the case when lr is set to zero using per layer lr scale from #243 and #258  (yet in this case more than just one test in `test_checkpoint` fail. 
Importantly, if the line `self.requires_grad = requires_grad and any(lr_scale_ != 0 for lr_scale_ in self.lr_scale)` [here](https://github.com/ServiceNow/Fast-LLM/blob/3ac976bdab51d44e775a9a7c2a30f4f18779c16a/fast_llm/tensor.py#L237) is replaced with simple `self.requires_grad = requires_grad` the test passes.

# 🎯 Expected Behavior

Test passes.

# 📜 Environment Information
-
# 📝 Additional Context
-

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] `test_checkpoint` test not passing when any lr scale is set to 0 #265

🐞 Describe the Bug

🔄 Steps to Reproduce

🎯 Expected Behavior

📜 Environment Information

📝 Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[bug] test_checkpoint test not passing when any lr scale is set to 0 #265

Description

🐞 Describe the Bug

🔄 Steps to Reproduce

🎯 Expected Behavior

📜 Environment Information

📝 Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[bug] `test_checkpoint` test not passing when any lr scale is set to 0 #265