Skip to content

fix: prevent dirty_data_key_count_ underflow from three sources#465

Merged
liunyl merged 2 commits into
mainfrom
fix/backfill-dirty-key-count-underflow
Jun 15, 2026
Merged

fix: prevent dirty_data_key_count_ underflow from three sources#465
liunyl merged 2 commits into
mainfrom
fix/backfill-dirty-key-count-underflow

Conversation

@githubzilla

@githubzilla githubzilla commented Mar 18, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fix an intermittent assertion failure in CcShard::AdjustDataKeyStats:

Assertion `dirty_delta >= 0 || dirty_data_key_count_ >= static_cast<size_t>(-dirty_delta)' failed.

Three bugs identified and fixed, all causing dirty_data_key_count_ to become inconsistent:

Bug #3 (confirmed root cause via core dump + diagnostic logs)

cc_request.hCanBeCleaned for CleanBucketData/CleanRangeData/CleanRangeDataForMigration

During bucket migration, CleanBucketData could free CcEntries that have BeingCkpt=true (being checkpointed). This causes dirty count to be decremented during cleanup, and then decremented again when UpdateCceCkptTsCc callback runs — a double-decrement that underflows.

Root cause sequence:

  1. Checkpoint collects entry, sets BeingCkpt=true, starts async flush
  2. CleanBucketData runs (under WriteLock), CanBeCleaned returns true unconditionally, entry freed, dirty_freed_cnt decremented
  3. UpdateCceCkptTsCc callback runs on CcShard — second decrement — underflow → assertion crash

Fix: CanBeCleaned() now checks !GetBeingCkpt() for these three clean types. Entries being checkpointed are skipped; clean_success_ is set to false, causing Execute(KickoutCcEntryCc&) in template_cc_map.h to break out of the page loop and re-enqueue the request to the same CC shard. The entry is cleaned on a subsequent retry once checkpoint clears BeingCkpt.

Design rationale — why not add bucket locks to DataSyncForHashPartition?

An alternative fix would be to have DataSyncForHashPartition acquire bucket read locks (as DataSyncForRangePartition already does), serializing checkpoint with migration and preventing the race entirely. However, this has significant downsides for hash partitions: hash checkpoint is dispatched per-core, and each core owns 1024 / num_cores buckets (e.g. 128 on an 8-core node), so locking all of them via ReadTxRequest adds overhead to every checkpoint cycle even when no migration is in progress. The current approach (skip + retry) only adds latency to migration when there is an active checkpoint on the same bucket — a rare overlap in practice.

Bug #1

template_cc_map.hBackFill wrong operation order + missing OnCommittedUpdate

BackFill called SetCkptTs(commit_ts)OnFlushed()SetCommitTsPayloadStatus(commit_ts, status). The last call overwrites commit_ts_and_status_ entirely, clearing the flush bit that SetCkptTs just set. This leaves the entry dirty without an OnCommittedUpdate increment. Affects CatalogCcMap and RangeCcMap (ObjectCcMap overrides BackFill correctly).

Also missing OnCommittedUpdate in the ReadOutsideCc backfill path.

Fix: Reorder to SetCommitTsPayloadStatus first, then SetCkptTs, then OnFlushed + OnCommittedUpdate. Add OnCommittedUpdate in ReadOutsideCc path.

Bug #2

cluster_config_cc_map.h — missing OnCommittedUpdate

Two sites call SetCommitTsPayloadStatus() without OnCommittedUpdate(), making the entry dirty without incrementing the counter.

Fix: Capture was_dirty before and call OnCommittedUpdate after both sites.

Assertion relaxation

cc_req_misc.cppUpdateCceCkptTsCc assertions

Relaxed 4 assertions from assert(v_entry->CommitTs() > 1 && !v_entry->IsPersistent()) to assert(v_entry->CommitTs() > 1). With the BackFill/ReadOutsideCc fixes, a concurrent backfill can legitimately mark an entry persistent before the checkpoint callback runs.

Files Changed

File Change
cc_request.h CanBeCleaned returns !GetBeingCkpt() for migration/bucket/range clean
template_cc_map.h BackFill reordered + OnCommittedUpdate in BackFill and ReadOutsideCc
cluster_config_cc_map.h OnCommittedUpdate at both SetCommitTsPayloadStatus sites
cc_req_misc.cpp Relaxed 4 assertions in UpdateCceCkptTsCc
cc_shard.cpp Removed diagnostic logging (kept assertion + clamp)

Testing

Verified by running cluster_scale_test.py (all 5 tests) dozens of times with no assertion failures.

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved checkpoint timestamp handling and callback execution order for better data consistency.
    • Refined entry cleanup eligibility logic to prevent operations on entries currently being checkpointed.
  • Chores

    • Simplified internal conditions and removed explanatory comments for cleaner code maintenance.

@coderabbitai

coderabbitai Bot commented Mar 18, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fbf33ee1-d63b-43d7-8ce7-8255d5c01f5e

📥 Commits

Reviewing files that changed from the base of the PR and between 807e707 and 914c061.

📒 Files selected for processing (5)
  • tx_service/include/cc/cc_request.h
  • tx_service/include/cc/cluster_config_cc_map.h
  • tx_service/include/cc/template_cc_map.h
  • tx_service/src/cc/cc_req_misc.cpp
  • tx_service/src/cc/cc_shard.cpp
💤 Files with no reviewable changes (1)
  • tx_service/src/cc/cc_shard.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • tx_service/src/cc/cc_req_misc.cpp

Walkthrough

This PR reorders and enhances commit/flush/checkpoint operations across multiple cache management components, adds OnCommittedUpdate callbacks after flush operations, makes cache entry cleanup eligibility conditional on checkpoint status, and loosens a debug assertion for persistent entries during checkpoint updates.

Changes

Cohort / File(s) Summary
Commit/Checkpoint Sequencing and Callbacks
tx_service/include/cc/template_cc_map.h, tx_service/include/cc/cluster_config_cc_map.h
Reorders commit-related mutations to execute SetCommitTsPayloadStatus before checkpoint timestamp updates, then calls OnFlushed followed by newly-added OnCommittedUpdate with captured was_dirty state in multiple code paths (normal backfill/commit flow, PostWrite writeback, ReplayLog).
Checkpoint Update and Dirty State Tracking
tx_service/src/cc/cc_req_misc.cpp
Removes && !v_entry->IsPersistent() predicate from assertions in checkpoint-update scan loop, so assertions now only verify v_entry->CommitTs() > 1 across all VersionedLruEntry instantiations.
Cache Entry Cleanup Eligibility
tx_service/include/cc/cc_request.h
Modifies KickoutCcEntryCc::CanBeCleaned() to conditionally return negation of GetBeingCkpt() for CleanRangeData, CleanRangeDataForMigration, and CleanBucketData cases, preventing cleanup of entries currently being checkpointed.
Code Cleanup
tx_service/src/cc/cc_shard.cpp
Removes four explanatory comments from AdjustDataKeyStats method while preserving all executable logic.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • liunyl
  • lokax
  • MrGuin

Poem

🐰 Timestamps dance in careful sequence,
Checkpoints flush with new deference,
Dirty states now captured true,
Callbacks bloom in ordered hue,
Cache keeps clean while systems brew!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description is comprehensive and well-structured, covering all three bugs, their root causes, fixes, design rationale, and testing. However, it does not explicitly reference a linked issue number using 'fixes eloqdb/tx_service#issue_id' format, nor does it confirm completion of the template checklist items. Add explicit issue reference using 'fixes eloqdb/tx_service#<issue_id>' format and confirm checklist items (tests added, documentation, test suite pass status) to fully align with the repository template.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: prevent dirty_data_key_count_ underflow from three sources' directly summarizes the main change—fixing an intermittent assertion failure caused by dirty_data_key_count_ underflow. It is concise, specific, and clearly communicates the primary objective of the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/backfill-dirty-key-count-underflow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

githubzilla added a commit to githubzilla/eloqkv that referenced this pull request Mar 18, 2026
…erflow

Update data_substrate submodule to include the fix for an intermittent
assertion failure in AdjustDataKeyStats where dirty_data_key_count_
underflows during cluster scale-out.

See eloqdata/tx_service#465 for details.
githubzilla added a commit to githubzilla/eloqkv that referenced this pull request Mar 18, 2026
…erflow

Update data_substrate submodule to include the fix for an intermittent
assertion failure in AdjustDataKeyStats where dirty_data_key_count_
underflows during cluster scale-out.

See eloqdata/tx_service#465 for details.
@githubzilla githubzilla force-pushed the fix/backfill-dirty-key-count-underflow branch from 60c8665 to 30351d6 Compare March 21, 2026 02:43
@githubzilla githubzilla changed the title fix: reorder BackFill operations to prevent dirty_data_key_count_ underflow fix: prevent dirty_data_key_count_ underflow from three sources Mar 21, 2026
@githubzilla githubzilla force-pushed the fix/backfill-dirty-key-count-underflow branch from 30351d6 to becb66e Compare March 21, 2026 03:30
githubzilla added a commit to githubzilla/eloqkv that referenced this pull request Mar 21, 2026
…erflow

Update data_substrate submodule to include the fix for an intermittent
assertion failure in AdjustDataKeyStats where dirty_data_key_count_
underflows during cluster scale-out.

See eloqdata/tx_service#465 for details.
Bug #3 (root cause of assertion crash): CleanBucketData/CleanRangeData could
free CcEntries with BeingCkpt=true, causing dirty count double-decrement when
the checkpoint callback (UpdateCceCkptTsCc) later runs on the freed entry.
Fix: CanBeCleaned() now returns !GetBeingCkpt() for CleanBucketData,
CleanRangeData, and CleanRangeDataForMigration. Entries being checkpointed
are skipped and retried later.

Bug #1: TemplateCcMap::BackFill called SetCkptTs() before
SetCommitTsPayloadStatus(), which overwrites commit_ts_and_status_ and
clears the flush bit, leaving the entry dirty without incrementing the
counter. Also missing OnCommittedUpdate in ReadOutsideCc backfill path.
Fix: Reorder to SetCommitTsPayloadStatus first, then SetCkptTs, and add
OnCommittedUpdate in both BackFill and ReadOutsideCc paths.

Bug #2: ClusterConfigCcMap called SetCommitTsPayloadStatus() at two sites
without OnCommittedUpdate(), making entries dirty without counting them.
Fix: Add OnCommittedUpdate after both SetCommitTsPayloadStatus calls.

Also relax UpdateCceCkptTsCc assertions to allow IsPersistent() being true,
since concurrent BackFill/ReadOutsideCc can legitimately mark an entry
persistent before the checkpoint callback runs.
@githubzilla githubzilla force-pushed the fix/backfill-dirty-key-count-underflow branch from 70ebf56 to 914c061 Compare March 24, 2026 04:42
githubzilla added a commit to githubzilla/eloqkv that referenced this pull request Mar 24, 2026
…erflow

Update data_substrate submodule to include the fix for an intermittent
assertion failure in AdjustDataKeyStats where dirty_data_key_count_
underflows during cluster scale-out.

See eloqdata/tx_service#465 for details.
liunyl added a commit that referenced this pull request Jun 13, 2026
Two distinct bugs were chained under the quarantined LargeObjLRU-Test (the
FastMetaDataMutex tls_shard_idx spin was fixed earlier in this branch):

1. Engine bug (template_cc_map.h, FindEmplace, LO_LRU policy): when the
   large-object page is the LAST page in the map and a key greater than it
   is inserted, the `target_it == ccmp_.end()` branch creates the new
   next-page but never repoints `target_page` to it (the parallel `else`
   branch does). The key is then Emplaced back into the large-object page,
   and TryUpdatePageKey calls FirstKey() on the still-empty new page ->
   assert(!keys_.empty()) (TC-FE-01). In NDEBUG this is worse: FirstKey()
   reads keys_.front() on an empty vector (UB) and the large-object
   "alone on its page" invariant is violated. Fix: add the missing
   `target_page = target_it->second.get();`, mirroring the else branch.

2. Test-setup bug (LargeObjLRU-Test.cpp, 4 sites): tests create partial-
   commit dirty entries and compensated dirty_data_key_count_ via
   f.shard.AdjustDataKeyStats(...) -- the SHARD counter only. OnCommittedUpdate
   normally bumps both the shard and the MAP counters, so the map's
   dirty_data_key_count_ was left at 0 while the entries are dirty, and
   Terminate()'s decrement underflowed it (assert in
   TemplateCcMap::AdjustDataKeyStats). This surfaced only once bug #1 stopped
   killing the suite earlier. Fix: drive the real clean->dirty API
   (cc_map.OnCommittedUpdate) so both counters stay in step. (This is
   unrelated to the production dirty-count underflow paths in #465.)

With both fixes the full LargeObjLRU-Test (23 cases, 3701 assertions) passes
deterministically, so it is un-gated in CI (only ClusterCrossNg-Test remains
non-gating). Full ctest suite: 40/40 green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@liunyl liunyl merged commit dc37e86 into main Jun 15, 2026
4 checks passed
@liunyl liunyl deleted the fix/backfill-dirty-key-count-underflow branch June 15, 2026 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants