Skip to content

bug: add mutex to protect range entry#3

Merged
lokax merged 4 commits into
eloqdata:mainfrom
lokax:fix-assert-07-11
Jul 18, 2025
Merged

bug: add mutex to protect range entry#3
lokax merged 4 commits into
eloqdata:mainfrom
lokax:fix-assert-07-11

Conversation

@lokax

@lokax lokax commented Jul 11, 2025

Copy link
Copy Markdown
Collaborator

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Reference the link of issue using fixes eloqdb/tx_service#issue_id
  • Reference the link of RFC if exists
  • Pass ./mtr --suite=mono_main,mono_multi,mono_basic

@lokax lokax changed the title Add mutex to protect range entry bug: add mutex to protect range entry Jul 11, 2025
@lokax lokax force-pushed the fix-assert-07-11 branch from 8bbe714 to a7c9b96 Compare July 18, 2025 07:02
@lokax lokax merged commit c531a56 into eloqdata:main Jul 18, 2025
2 checks passed
githubzilla added a commit that referenced this pull request Mar 21, 2026
Bug #3 (root cause of assertion crash): CleanBucketData/CleanRangeData could
free CcEntries with BeingCkpt=true, causing dirty count double-decrement when
the checkpoint callback (UpdateCceCkptTsCc) later runs on the freed entry.
Fix: CanBeCleaned() now returns !GetBeingCkpt() for CleanBucketData,
CleanRangeData, and CleanRangeDataForMigration. Entries being checkpointed
are skipped and retried later.

Bug #1: TemplateCcMap::BackFill called SetCkptTs() before
SetCommitTsPayloadStatus(), which overwrites commit_ts_and_status_ and
clears the flush bit, leaving the entry dirty without incrementing the
counter. Also missing OnCommittedUpdate in ReadOutsideCc backfill path.
Fix: Reorder to SetCommitTsPayloadStatus first, then SetCkptTs, and add
OnCommittedUpdate in both BackFill and ReadOutsideCc paths.

Bug #2: ClusterConfigCcMap called SetCommitTsPayloadStatus() at two sites
without OnCommittedUpdate(), making entries dirty without counting them.
Fix: Add OnCommittedUpdate after both SetCommitTsPayloadStatus calls.

Also relax UpdateCceCkptTsCc assertions to allow IsPersistent() being true,
since concurrent BackFill/ReadOutsideCc can legitimately mark an entry
persistent before the checkpoint callback runs.
githubzilla added a commit that referenced this pull request Mar 21, 2026
Bug #3 (root cause of assertion crash): CleanBucketData/CleanRangeData could
free CcEntries with BeingCkpt=true, causing dirty count double-decrement when
the checkpoint callback (UpdateCceCkptTsCc) later runs on the freed entry.
Fix: CanBeCleaned() now returns !GetBeingCkpt() for CleanBucketData,
CleanRangeData, and CleanRangeDataForMigration. Entries being checkpointed
are skipped and retried later.

Bug #1: TemplateCcMap::BackFill called SetCkptTs() before
SetCommitTsPayloadStatus(), which overwrites commit_ts_and_status_ and
clears the flush bit, leaving the entry dirty without incrementing the
counter. Also missing OnCommittedUpdate in ReadOutsideCc backfill path.
Fix: Reorder to SetCommitTsPayloadStatus first, then SetCkptTs, and add
OnCommittedUpdate in both BackFill and ReadOutsideCc paths.

Bug #2: ClusterConfigCcMap called SetCommitTsPayloadStatus() at two sites
without OnCommittedUpdate(), making entries dirty without counting them.
Fix: Add OnCommittedUpdate after both SetCommitTsPayloadStatus calls.

Also relax UpdateCceCkptTsCc assertions to allow IsPersistent() being true,
since concurrent BackFill/ReadOutsideCc can legitimately mark an entry
persistent before the checkpoint callback runs.
githubzilla added a commit that referenced this pull request Mar 24, 2026
Bug #3 (root cause of assertion crash): CleanBucketData/CleanRangeData could
free CcEntries with BeingCkpt=true, causing dirty count double-decrement when
the checkpoint callback (UpdateCceCkptTsCc) later runs on the freed entry.
Fix: CanBeCleaned() now returns !GetBeingCkpt() for CleanBucketData,
CleanRangeData, and CleanRangeDataForMigration. Entries being checkpointed
are skipped and retried later.

Bug #1: TemplateCcMap::BackFill called SetCkptTs() before
SetCommitTsPayloadStatus(), which overwrites commit_ts_and_status_ and
clears the flush bit, leaving the entry dirty without incrementing the
counter. Also missing OnCommittedUpdate in ReadOutsideCc backfill path.
Fix: Reorder to SetCommitTsPayloadStatus first, then SetCkptTs, and add
OnCommittedUpdate in both BackFill and ReadOutsideCc paths.

Bug #2: ClusterConfigCcMap called SetCommitTsPayloadStatus() at two sites
without OnCommittedUpdate(), making entries dirty without counting them.
Fix: Add OnCommittedUpdate after both SetCommitTsPayloadStatus calls.

Also relax UpdateCceCkptTsCc assertions to allow IsPersistent() being true,
since concurrent BackFill/ReadOutsideCc can legitimately mark an entry
persistent before the checkpoint callback runs.
liunyl pushed a commit that referenced this pull request Jun 15, 2026
Bug #3 (root cause of assertion crash): CleanBucketData/CleanRangeData could
free CcEntries with BeingCkpt=true, causing dirty count double-decrement when
the checkpoint callback (UpdateCceCkptTsCc) later runs on the freed entry.
Fix: CanBeCleaned() now returns !GetBeingCkpt() for CleanBucketData,
CleanRangeData, and CleanRangeDataForMigration. Entries being checkpointed
are skipped and retried later.

Bug #1: TemplateCcMap::BackFill called SetCkptTs() before
SetCommitTsPayloadStatus(), which overwrites commit_ts_and_status_ and
clears the flush bit, leaving the entry dirty without incrementing the
counter. Also missing OnCommittedUpdate in ReadOutsideCc backfill path.
Fix: Reorder to SetCommitTsPayloadStatus first, then SetCkptTs, and add
OnCommittedUpdate in both BackFill and ReadOutsideCc paths.

Bug #2: ClusterConfigCcMap called SetCommitTsPayloadStatus() at two sites
without OnCommittedUpdate(), making entries dirty without counting them.
Fix: Add OnCommittedUpdate after both SetCommitTsPayloadStatus calls.

Also relax UpdateCceCkptTsCc assertions to allow IsPersistent() being true,
since concurrent BackFill/ReadOutsideCc can legitimately mark an entry
persistent before the checkpoint callback runs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant