Skip to content

Modular checkpointing#22

Merged
jlamypoirier merged 4 commits into
mainfrom
modular_checkpointing
Oct 25, 2024
Merged

Modular checkpointing#22
jlamypoirier merged 4 commits into
mainfrom
modular_checkpointing

Conversation

@jlamypoirier
Copy link
Copy Markdown
Collaborator

@jlamypoirier jlamypoirier commented Oct 23, 2024

✨ Description

Third round of checkpoint improvements.

Functional changes:

  • Checkpoint format and model_type are merged into a single entry, with the original model_type becoming the format for external checkpoints. Temporary backward compatible.
  • Checkpoint conversion config has been updated to use the new checkpoint config format, i.e. input: CheckpointLoadConfig, output: CheckpointSaveConfig. Temporary backward compatible.

Dev changes:

  • Checkpoints are now modular, with all checkpointing formats following the same simple interface.
  • User can define arbitrary checkpoint format and mess with existing ones as needed. The default distributed and state dict format remain the official and supported format, but are no longer hard-coded.
  • Moved some content from FastLLMModel to MultiStage. FastLLMModel is basically gone, I'll probably merge the two classes soon.

🔍 Type of change

Select all that apply:

  • 🐛 Bug fix (non-breaking change that addresses a specific issue)
  • 🚀 New feature (non-breaking change that adds functionality)
  • ⚠️ Breaking change (a change that could affect existing functionality)
  • 📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
  • 🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
  • 📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
  • 📝 Documentation change (updates documentation, including new content or typo fixes)
  • 🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

📝 Changes

@jlamypoirier jlamypoirier marked this pull request as ready for review October 24, 2024 19:07
@jlamypoirier jlamypoirier mentioned this pull request Oct 25, 2024
@jlamypoirier jlamypoirier merged commit 8fee762 into main Oct 25, 2024
@jlamypoirier jlamypoirier deleted the modular_checkpointing branch October 25, 2024 19:17
@tscholak tscholak added this to the 0.2.0 milestone Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants