Skip to content

refactor: remove obsolete dynamic checkpoint mismatch handling (cleanup PR #4099)#4340

Open
RexBearIU wants to merge 1 commit into
mainfrom
jackyf/cleanup-scan-layers-mismatch
Open

refactor: remove obsolete dynamic checkpoint mismatch handling (cleanup PR #4099)#4340
RexBearIU wants to merge 1 commit into
mainfrom
jackyf/cleanup-scan-layers-mismatch

Conversation

@RexBearIU

@RexBearIU RexBearIU commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Description

This PR cleans up the dynamic checkpoint mismatch error-handling logic originally introduced in PR #4099. Since the proactive validation of scan_layers from checkpoint metadata (PR #4304) catches mismatches early, the after-the-fact recovery logic is no longer needed.

Key Changes:

  • Removed obsolete helpers (handle_checkpoint_mismatch, is_structural_or_shape_mismatch) from checkpointing.py.
  • Simplified the from_pretrained loading block in model_creation_utils.py.
  • Deleted obsolete unit tests (TestCheckpointMismatchHandling and related test cases) in checkpointing_nnx_load_test.py.
  • Updated test_scan_layers_mismatch_tpu in checkpointing_test.py to assert on proactive verification error messages.
  • Fixed an f-string quoting compatibility issue in inspect_checkpoint.py to allow pylint to pass on Python < 3.12.

Dependency Note:

⚠️ This PR must be merged AFTER PR #4304 (which implements the proactive metadata-based validation).

Tests

Tested this change by running the following unit and integration tests under the CPU platform:

  • tests/unit/checkpointing_nnx_load_test.py
  • tests/unit/model_creation_utils_test.py
  • tests/post_training/unit/lora_utils_test.py
  • tests/integration/checkpointing_test.py

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov

codecov Bot commented Jul 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 72.17391% with 32 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/maxtext/utils/model_creation_utils.py 78.94% 14 Missing and 6 partials ⚠️
src/maxtext/common/checkpointing.py 42.10% 10 Missing and 1 partial ⚠️
...axtext/checkpoint_conversion/inspect_checkpoint.py 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@RexBearIU RexBearIU force-pushed the jackyf/cleanup-scan-layers-mismatch branch from 2349804 to 9a4fc10 Compare July 2, 2026 10:26
@RexBearIU RexBearIU changed the title refactor(scan_layers): clean up dynamic checkpoint mismatch handling refactor: remove obsolete dynamic checkpoint mismatch handling (cleanup PR #4099) Jul 2, 2026
@RexBearIU RexBearIU force-pushed the jackyf/cleanup-scan-layers-mismatch branch from 9a4fc10 to 5b2fc87 Compare July 2, 2026 10:29
@RexBearIU RexBearIU force-pushed the jackyf/cleanup-scan-layers-mismatch branch from 5b2fc87 to 009e7ad Compare July 2, 2026 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants