fix: prevent dirty_data_key_count_ underflow from three sources#465
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
💤 Files with no reviewable changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughThis PR reorders and enhances commit/flush/checkpoint operations across multiple cache management components, adds Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…erflow Update data_substrate submodule to include the fix for an intermittent assertion failure in AdjustDataKeyStats where dirty_data_key_count_ underflows during cluster scale-out. See eloqdata/tx_service#465 for details.
…erflow Update data_substrate submodule to include the fix for an intermittent assertion failure in AdjustDataKeyStats where dirty_data_key_count_ underflows during cluster scale-out. See eloqdata/tx_service#465 for details.
60c8665 to
30351d6
Compare
30351d6 to
becb66e
Compare
…erflow Update data_substrate submodule to include the fix for an intermittent assertion failure in AdjustDataKeyStats where dirty_data_key_count_ underflows during cluster scale-out. See eloqdata/tx_service#465 for details.
Bug #3 (root cause of assertion crash): CleanBucketData/CleanRangeData could free CcEntries with BeingCkpt=true, causing dirty count double-decrement when the checkpoint callback (UpdateCceCkptTsCc) later runs on the freed entry. Fix: CanBeCleaned() now returns !GetBeingCkpt() for CleanBucketData, CleanRangeData, and CleanRangeDataForMigration. Entries being checkpointed are skipped and retried later. Bug #1: TemplateCcMap::BackFill called SetCkptTs() before SetCommitTsPayloadStatus(), which overwrites commit_ts_and_status_ and clears the flush bit, leaving the entry dirty without incrementing the counter. Also missing OnCommittedUpdate in ReadOutsideCc backfill path. Fix: Reorder to SetCommitTsPayloadStatus first, then SetCkptTs, and add OnCommittedUpdate in both BackFill and ReadOutsideCc paths. Bug #2: ClusterConfigCcMap called SetCommitTsPayloadStatus() at two sites without OnCommittedUpdate(), making entries dirty without counting them. Fix: Add OnCommittedUpdate after both SetCommitTsPayloadStatus calls. Also relax UpdateCceCkptTsCc assertions to allow IsPersistent() being true, since concurrent BackFill/ReadOutsideCc can legitimately mark an entry persistent before the checkpoint callback runs.
70ebf56 to
914c061
Compare
…erflow Update data_substrate submodule to include the fix for an intermittent assertion failure in AdjustDataKeyStats where dirty_data_key_count_ underflows during cluster scale-out. See eloqdata/tx_service#465 for details.
Two distinct bugs were chained under the quarantined LargeObjLRU-Test (the FastMetaDataMutex tls_shard_idx spin was fixed earlier in this branch): 1. Engine bug (template_cc_map.h, FindEmplace, LO_LRU policy): when the large-object page is the LAST page in the map and a key greater than it is inserted, the `target_it == ccmp_.end()` branch creates the new next-page but never repoints `target_page` to it (the parallel `else` branch does). The key is then Emplaced back into the large-object page, and TryUpdatePageKey calls FirstKey() on the still-empty new page -> assert(!keys_.empty()) (TC-FE-01). In NDEBUG this is worse: FirstKey() reads keys_.front() on an empty vector (UB) and the large-object "alone on its page" invariant is violated. Fix: add the missing `target_page = target_it->second.get();`, mirroring the else branch. 2. Test-setup bug (LargeObjLRU-Test.cpp, 4 sites): tests create partial- commit dirty entries and compensated dirty_data_key_count_ via f.shard.AdjustDataKeyStats(...) -- the SHARD counter only. OnCommittedUpdate normally bumps both the shard and the MAP counters, so the map's dirty_data_key_count_ was left at 0 while the entries are dirty, and Terminate()'s decrement underflowed it (assert in TemplateCcMap::AdjustDataKeyStats). This surfaced only once bug #1 stopped killing the suite earlier. Fix: drive the real clean->dirty API (cc_map.OnCommittedUpdate) so both counters stay in step. (This is unrelated to the production dirty-count underflow paths in #465.) With both fixes the full LargeObjLRU-Test (23 cases, 3701 assertions) passes deterministically, so it is un-gated in CI (only ClusterCrossNg-Test remains non-gating). Full ctest suite: 40/40 green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Summary
Fix an intermittent assertion failure in
CcShard::AdjustDataKeyStats:Three bugs identified and fixed, all causing
dirty_data_key_count_to become inconsistent:Bug #3 (confirmed root cause via core dump + diagnostic logs)
cc_request.h—CanBeCleanedfor CleanBucketData/CleanRangeData/CleanRangeDataForMigrationDuring bucket migration,
CleanBucketDatacould free CcEntries that haveBeingCkpt=true(being checkpointed). This causes dirty count to be decremented during cleanup, and then decremented again whenUpdateCceCkptTsCccallback runs — a double-decrement that underflows.Root cause sequence:
BeingCkpt=true, starts async flushCleanBucketDataruns (under WriteLock),CanBeCleanedreturnstrueunconditionally, entry freed,dirty_freed_cntdecrementedUpdateCceCkptTsCccallback runs on CcShard — second decrement — underflow → assertion crashFix:
CanBeCleaned()now checks!GetBeingCkpt()for these three clean types. Entries being checkpointed are skipped;clean_success_is set tofalse, causingExecute(KickoutCcEntryCc&)intemplate_cc_map.hto break out of the page loop and re-enqueue the request to the same CC shard. The entry is cleaned on a subsequent retry once checkpoint clearsBeingCkpt.Design rationale — why not add bucket locks to DataSyncForHashPartition?
An alternative fix would be to have
DataSyncForHashPartitionacquire bucket read locks (asDataSyncForRangePartitionalready does), serializing checkpoint with migration and preventing the race entirely. However, this has significant downsides for hash partitions: hash checkpoint is dispatched per-core, and each core owns1024 / num_coresbuckets (e.g. 128 on an 8-core node), so locking all of them viaReadTxRequestadds overhead to every checkpoint cycle even when no migration is in progress. The current approach (skip + retry) only adds latency to migration when there is an active checkpoint on the same bucket — a rare overlap in practice.Bug #1
template_cc_map.h—BackFillwrong operation order + missingOnCommittedUpdateBackFillcalledSetCkptTs(commit_ts)→OnFlushed()→SetCommitTsPayloadStatus(commit_ts, status). The last call overwritescommit_ts_and_status_entirely, clearing the flush bit thatSetCkptTsjust set. This leaves the entry dirty without anOnCommittedUpdateincrement. AffectsCatalogCcMapandRangeCcMap(ObjectCcMap overrides BackFill correctly).Also missing
OnCommittedUpdatein theReadOutsideCcbackfill path.Fix: Reorder to
SetCommitTsPayloadStatusfirst, thenSetCkptTs, thenOnFlushed+OnCommittedUpdate. AddOnCommittedUpdateinReadOutsideCcpath.Bug #2
cluster_config_cc_map.h— missingOnCommittedUpdateTwo sites call
SetCommitTsPayloadStatus()withoutOnCommittedUpdate(), making the entry dirty without incrementing the counter.Fix: Capture
was_dirtybefore and callOnCommittedUpdateafter both sites.Assertion relaxation
cc_req_misc.cpp—UpdateCceCkptTsCcassertionsRelaxed 4 assertions from
assert(v_entry->CommitTs() > 1 && !v_entry->IsPersistent())toassert(v_entry->CommitTs() > 1). With the BackFill/ReadOutsideCc fixes, a concurrent backfill can legitimately mark an entry persistent before the checkpoint callback runs.Files Changed
cc_request.hCanBeCleanedreturns!GetBeingCkpt()for migration/bucket/range cleantemplate_cc_map.hOnCommittedUpdatein BackFill and ReadOutsideCccluster_config_cc_map.hOnCommittedUpdateat bothSetCommitTsPayloadStatussitescc_req_misc.cppUpdateCceCkptTsCccc_shard.cppTesting
Verified by running
cluster_scale_test.py(all 5 tests) dozens of times with no assertion failures.Summary by CodeRabbit
Release Notes
Bug Fixes
Chores