Skip to content

feat: merge recent work into main#1

Merged
zhangh43 merged 1 commit into
mainfrom
merge
Jul 7, 2025
Merged

feat: merge recent work into main#1
zhangh43 merged 1 commit into
mainfrom
merge

Conversation

@zhangh43

@zhangh43 zhangh43 commented Jul 7, 2025

Copy link
Copy Markdown
Contributor

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Reference the link of issue using fixes eloqdb/tx_service#issue_id
  • Reference the link of RFC if exists
  • Pass ./mtr --suite=mono_main,mono_multi,mono_basic

@CLAassistant

CLAassistant commented Jul 7, 2025

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@zhangh43 zhangh43 merged commit 92178dd into main Jul 7, 2025
3 checks passed
@zhangh43 zhangh43 deleted the merge branch July 7, 2025 08:23
githubzilla added a commit that referenced this pull request Mar 21, 2026
Bug #3 (root cause of assertion crash): CleanBucketData/CleanRangeData could
free CcEntries with BeingCkpt=true, causing dirty count double-decrement when
the checkpoint callback (UpdateCceCkptTsCc) later runs on the freed entry.
Fix: CanBeCleaned() now returns !GetBeingCkpt() for CleanBucketData,
CleanRangeData, and CleanRangeDataForMigration. Entries being checkpointed
are skipped and retried later.

Bug #1: TemplateCcMap::BackFill called SetCkptTs() before
SetCommitTsPayloadStatus(), which overwrites commit_ts_and_status_ and
clears the flush bit, leaving the entry dirty without incrementing the
counter. Also missing OnCommittedUpdate in ReadOutsideCc backfill path.
Fix: Reorder to SetCommitTsPayloadStatus first, then SetCkptTs, and add
OnCommittedUpdate in both BackFill and ReadOutsideCc paths.

Bug #2: ClusterConfigCcMap called SetCommitTsPayloadStatus() at two sites
without OnCommittedUpdate(), making entries dirty without counting them.
Fix: Add OnCommittedUpdate after both SetCommitTsPayloadStatus calls.

Also relax UpdateCceCkptTsCc assertions to allow IsPersistent() being true,
since concurrent BackFill/ReadOutsideCc can legitimately mark an entry
persistent before the checkpoint callback runs.
githubzilla added a commit that referenced this pull request Mar 21, 2026
Bug #3 (root cause of assertion crash): CleanBucketData/CleanRangeData could
free CcEntries with BeingCkpt=true, causing dirty count double-decrement when
the checkpoint callback (UpdateCceCkptTsCc) later runs on the freed entry.
Fix: CanBeCleaned() now returns !GetBeingCkpt() for CleanBucketData,
CleanRangeData, and CleanRangeDataForMigration. Entries being checkpointed
are skipped and retried later.

Bug #1: TemplateCcMap::BackFill called SetCkptTs() before
SetCommitTsPayloadStatus(), which overwrites commit_ts_and_status_ and
clears the flush bit, leaving the entry dirty without incrementing the
counter. Also missing OnCommittedUpdate in ReadOutsideCc backfill path.
Fix: Reorder to SetCommitTsPayloadStatus first, then SetCkptTs, and add
OnCommittedUpdate in both BackFill and ReadOutsideCc paths.

Bug #2: ClusterConfigCcMap called SetCommitTsPayloadStatus() at two sites
without OnCommittedUpdate(), making entries dirty without counting them.
Fix: Add OnCommittedUpdate after both SetCommitTsPayloadStatus calls.

Also relax UpdateCceCkptTsCc assertions to allow IsPersistent() being true,
since concurrent BackFill/ReadOutsideCc can legitimately mark an entry
persistent before the checkpoint callback runs.
githubzilla added a commit that referenced this pull request Mar 24, 2026
Bug #3 (root cause of assertion crash): CleanBucketData/CleanRangeData could
free CcEntries with BeingCkpt=true, causing dirty count double-decrement when
the checkpoint callback (UpdateCceCkptTsCc) later runs on the freed entry.
Fix: CanBeCleaned() now returns !GetBeingCkpt() for CleanBucketData,
CleanRangeData, and CleanRangeDataForMigration. Entries being checkpointed
are skipped and retried later.

Bug #1: TemplateCcMap::BackFill called SetCkptTs() before
SetCommitTsPayloadStatus(), which overwrites commit_ts_and_status_ and
clears the flush bit, leaving the entry dirty without incrementing the
counter. Also missing OnCommittedUpdate in ReadOutsideCc backfill path.
Fix: Reorder to SetCommitTsPayloadStatus first, then SetCkptTs, and add
OnCommittedUpdate in both BackFill and ReadOutsideCc paths.

Bug #2: ClusterConfigCcMap called SetCommitTsPayloadStatus() at two sites
without OnCommittedUpdate(), making entries dirty without counting them.
Fix: Add OnCommittedUpdate after both SetCommitTsPayloadStatus calls.

Also relax UpdateCceCkptTsCc assertions to allow IsPersistent() being true,
since concurrent BackFill/ReadOutsideCc can legitimately mark an entry
persistent before the checkpoint callback runs.
liunyl added a commit that referenced this pull request Jun 13, 2026
Two distinct bugs were chained under the quarantined LargeObjLRU-Test (the
FastMetaDataMutex tls_shard_idx spin was fixed earlier in this branch):

1. Engine bug (template_cc_map.h, FindEmplace, LO_LRU policy): when the
   large-object page is the LAST page in the map and a key greater than it
   is inserted, the `target_it == ccmp_.end()` branch creates the new
   next-page but never repoints `target_page` to it (the parallel `else`
   branch does). The key is then Emplaced back into the large-object page,
   and TryUpdatePageKey calls FirstKey() on the still-empty new page ->
   assert(!keys_.empty()) (TC-FE-01). In NDEBUG this is worse: FirstKey()
   reads keys_.front() on an empty vector (UB) and the large-object
   "alone on its page" invariant is violated. Fix: add the missing
   `target_page = target_it->second.get();`, mirroring the else branch.

2. Test-setup bug (LargeObjLRU-Test.cpp, 4 sites): tests create partial-
   commit dirty entries and compensated dirty_data_key_count_ via
   f.shard.AdjustDataKeyStats(...) -- the SHARD counter only. OnCommittedUpdate
   normally bumps both the shard and the MAP counters, so the map's
   dirty_data_key_count_ was left at 0 while the entries are dirty, and
   Terminate()'s decrement underflowed it (assert in
   TemplateCcMap::AdjustDataKeyStats). This surfaced only once bug #1 stopped
   killing the suite earlier. Fix: drive the real clean->dirty API
   (cc_map.OnCommittedUpdate) so both counters stay in step. (This is
   unrelated to the production dirty-count underflow paths in #465.)

With both fixes the full LargeObjLRU-Test (23 cases, 3701 assertions) passes
deterministically, so it is un-gated in CI (only ClusterCrossNg-Test remains
non-gating). Full ctest suite: 40/40 green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
liunyl pushed a commit that referenced this pull request Jun 15, 2026
Bug #3 (root cause of assertion crash): CleanBucketData/CleanRangeData could
free CcEntries with BeingCkpt=true, causing dirty count double-decrement when
the checkpoint callback (UpdateCceCkptTsCc) later runs on the freed entry.
Fix: CanBeCleaned() now returns !GetBeingCkpt() for CleanBucketData,
CleanRangeData, and CleanRangeDataForMigration. Entries being checkpointed
are skipped and retried later.

Bug #1: TemplateCcMap::BackFill called SetCkptTs() before
SetCommitTsPayloadStatus(), which overwrites commit_ts_and_status_ and
clears the flush bit, leaving the entry dirty without incrementing the
counter. Also missing OnCommittedUpdate in ReadOutsideCc backfill path.
Fix: Reorder to SetCommitTsPayloadStatus first, then SetCkptTs, and add
OnCommittedUpdate in both BackFill and ReadOutsideCc paths.

Bug #2: ClusterConfigCcMap called SetCommitTsPayloadStatus() at two sites
without OnCommittedUpdate(), making entries dirty without counting them.
Fix: Add OnCommittedUpdate after both SetCommitTsPayloadStatus calls.

Also relax UpdateCceCkptTsCc assertions to allow IsPersistent() being true,
since concurrent BackFill/ReadOutsideCc can legitimately mark an entry
persistent before the checkpoint callback runs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants