🎯 Goal (What & Why)
Currently, a pretrained config overrides an arbitrary part of the user-specified config. This causes a lot of troubles:
I suggest flipping things around so the specified model config overrides the pretrained config. This should give us the behaviour we want in most cases:
- Pretrained config, no base model config: All architecture parameters are imported, and so are relevant non-architecture parameters (ex.
window_size). Other non-architecture parameters take the Fast-LLM default.
- Pretrained config, base model config with non-architecture parameters: Parameters explicitly specified in the base model config are taken, others are as above.
- Pretrained config, base model config with architecture parameters: We probably want to enforce matching values, and raise an error for any mismatch. (This would be an improvement because right now wrong values are silently ignored.)
- No pretrained config: Same as before.
🚀 Execution Plan
We can use Fast-LLM's override mechanism as in #168.
However, we'll also need to adapt the update mechanism to get the behaviour we want for nested configs.
It could also be difficult to achieve backward compatibility.
📌 Acceptance Criteria (Must-Haves for Completion)
- Things should work as described above
🛠️ Project Management
🎯 Goal (What & Why)
Currently, a pretrained config overrides an arbitrary part of the user-specified config. This causes a lot of troubles:
I suggest flipping things around so the specified model config overrides the pretrained config. This should give us the behaviour we want in most cases:
window_size). Other non-architecture parameters take the Fast-LLM default.🚀 Execution Plan
We can use Fast-LLM's override mechanism as in #168.
However, we'll also need to adapt the update mechanism to get the behaviour we want for nested configs.
It could also be difficult to achieve backward compatibility.
📌 Acceptance Criteria (Must-Haves for Completion)
🛠️ Project Management
Estimatefield (in days) in the GitHub project.Sizefield to categorize the PR size (Small/Medium/Large).