fix: prevent dirty_data_key_count_ underflow from three sources by githubzilla · Pull Request #465 · eloqdata/tx_service

githubzilla · 2026-03-18T15:00:21Z

Summary

Fix an intermittent assertion failure in CcShard::AdjustDataKeyStats:

Assertion `dirty_delta >= 0 || dirty_data_key_count_ >= static_cast<size_t>(-dirty_delta)' failed.

Three bugs identified and fixed, all causing dirty_data_key_count_ to become inconsistent:

Bug #3 (confirmed root cause via core dump + diagnostic logs)

cc_request.h — CanBeCleaned for CleanBucketData/CleanRangeData/CleanRangeDataForMigration

During bucket migration, CleanBucketData could free CcEntries that have BeingCkpt=true (being checkpointed). This causes dirty count to be decremented during cleanup, and then decremented again when UpdateCceCkptTsCc callback runs — a double-decrement that underflows.

Root cause sequence:

Checkpoint collects entry, sets BeingCkpt=true, starts async flush
CleanBucketData runs (under WriteLock), CanBeCleaned returns true unconditionally, entry freed, dirty_freed_cnt decremented
UpdateCceCkptTsCc callback runs on CcShard — second decrement — underflow → assertion crash

Fix: CanBeCleaned() now checks !GetBeingCkpt() for these three clean types. Entries being checkpointed are skipped; clean_success_ is set to false, causing Execute(KickoutCcEntryCc&) in template_cc_map.h to break out of the page loop and re-enqueue the request to the same CC shard. The entry is cleaned on a subsequent retry once checkpoint clears BeingCkpt.

Design rationale — why not add bucket locks to DataSyncForHashPartition?

An alternative fix would be to have DataSyncForHashPartition acquire bucket read locks (as DataSyncForRangePartition already does), serializing checkpoint with migration and preventing the race entirely. However, this has significant downsides for hash partitions: hash checkpoint is dispatched per-core, and each core owns 1024 / num_cores buckets (e.g. 128 on an 8-core node), so locking all of them via ReadTxRequest adds overhead to every checkpoint cycle even when no migration is in progress. The current approach (skip + retry) only adds latency to migration when there is an active checkpoint on the same bucket — a rare overlap in practice.

Bug #1

template_cc_map.h — BackFill wrong operation order + missing OnCommittedUpdate

BackFill called SetCkptTs(commit_ts) → OnFlushed() → SetCommitTsPayloadStatus(commit_ts, status). The last call overwrites commit_ts_and_status_ entirely, clearing the flush bit that SetCkptTs just set. This leaves the entry dirty without an OnCommittedUpdate increment. Affects CatalogCcMap and RangeCcMap (ObjectCcMap overrides BackFill correctly).

Also missing OnCommittedUpdate in the ReadOutsideCc backfill path.

Fix: Reorder to SetCommitTsPayloadStatus first, then SetCkptTs, then OnFlushed + OnCommittedUpdate. Add OnCommittedUpdate in ReadOutsideCc path.

Bug #2

cluster_config_cc_map.h — missing OnCommittedUpdate

Two sites call SetCommitTsPayloadStatus() without OnCommittedUpdate(), making the entry dirty without incrementing the counter.

Fix: Capture was_dirty before and call OnCommittedUpdate after both sites.

Assertion relaxation

cc_req_misc.cpp — UpdateCceCkptTsCc assertions

Relaxed 4 assertions from assert(v_entry->CommitTs() > 1 && !v_entry->IsPersistent()) to assert(v_entry->CommitTs() > 1). With the BackFill/ReadOutsideCc fixes, a concurrent backfill can legitimately mark an entry persistent before the checkpoint callback runs.

Files Changed

File	Change
`cc_request.h`	`CanBeCleaned` returns `!GetBeingCkpt()` for migration/bucket/range clean
`template_cc_map.h`	BackFill reordered + `OnCommittedUpdate` in BackFill and ReadOutsideCc
`cluster_config_cc_map.h`	`OnCommittedUpdate` at both `SetCommitTsPayloadStatus` sites
`cc_req_misc.cpp`	Relaxed 4 assertions in `UpdateCceCkptTsCc`
`cc_shard.cpp`	Removed diagnostic logging (kept assertion + clamp)

Testing

Verified by running cluster_scale_test.py (all 5 tests) dozens of times with no assertion failures.

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved checkpoint timestamp handling and callback execution order for better data consistency.
- Refined entry cleanup eligibility logic to prevent operations on entries currently being checkpointed.
Chores
- Simplified internal conditions and removed explanatory comments for cleaner code maintenance.

coderabbitai · 2026-03-18T15:00:44Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fbf33ee1-d63b-43d7-8ce7-8255d5c01f5e

📥 Commits

Reviewing files that changed from the base of the PR and between 807e707 and 914c061.

📒 Files selected for processing (5)

tx_service/include/cc/cc_request.h
tx_service/include/cc/cluster_config_cc_map.h
tx_service/include/cc/template_cc_map.h
tx_service/src/cc/cc_req_misc.cpp
tx_service/src/cc/cc_shard.cpp

💤 Files with no reviewable changes (1)

tx_service/src/cc/cc_shard.cpp

🚧 Files skipped from review as they are similar to previous changes (1)

tx_service/src/cc/cc_req_misc.cpp

Walkthrough

This PR reorders and enhances commit/flush/checkpoint operations across multiple cache management components, adds OnCommittedUpdate callbacks after flush operations, makes cache entry cleanup eligibility conditional on checkpoint status, and loosens a debug assertion for persistent entries during checkpoint updates.

Changes

Cohort / File(s)	Summary
Commit/Checkpoint Sequencing and Callbacks `tx_service/include/cc/template_cc_map.h`, `tx_service/include/cc/cluster_config_cc_map.h`	Reorders commit-related mutations to execute `SetCommitTsPayloadStatus` before checkpoint timestamp updates, then calls `OnFlushed` followed by newly-added `OnCommittedUpdate` with captured `was_dirty` state in multiple code paths (normal backfill/commit flow, PostWrite writeback, ReplayLog).
Checkpoint Update and Dirty State Tracking `tx_service/src/cc/cc_req_misc.cpp`	Removes `&& !v_entry->IsPersistent()` predicate from assertions in checkpoint-update scan loop, so assertions now only verify `v_entry->CommitTs() > 1` across all `VersionedLruEntry` instantiations.
Cache Entry Cleanup Eligibility `tx_service/include/cc/cc_request.h`	Modifies `KickoutCcEntryCc::CanBeCleaned()` to conditionally return negation of `GetBeingCkpt()` for `CleanRangeData`, `CleanRangeDataForMigration`, and `CleanBucketData` cases, preventing cleanup of entries currently being checkpointed.
Code Cleanup `tx_service/src/cc/cc_shard.cpp`	Removes four explanatory comments from `AdjustDataKeyStats` method while preserving all executable logic.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Fix underflow dirty size #401: Modifies the same commit/flush handling in TemplateCcMap with OnCommittedUpdate/OnFlushed callbacks and per-entry dirty state adjustments in cc_req_misc.
fix standby cce eviction and forward msg #384: Changes template_cc_map.h's backfill/commit handling with reordering of commit/checkpoint updates and per-entry commit-status modifications.
remove schema modification in clean_ccm_op_ #236: Modifies KickoutCcEntryCc behavior in cc_request.h for CCM schema-update paths.

Suggested reviewers

liunyl
lokax
MrGuin

Poem

🐰 Timestamps dance in careful sequence,
Checkpoints flush with new deference,
Dirty states now captured true,
Callbacks bloom in ordered hue,
Cache keeps clean while systems brew!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description is comprehensive and well-structured, covering all three bugs, their root causes, fixes, design rationale, and testing. However, it does not explicitly reference a linked issue number using 'fixes eloqdb/tx_service#issue_id' format, nor does it confirm completion of the template checklist items.	Add explicit issue reference using 'fixes eloqdb/tx_service#<issue_id>' format and confirm checklist items (tests added, documentation, test suite pass status) to fully align with the repository template.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: prevent dirty_data_key_count_ underflow from three sources' directly summarizes the main change—fixing an intermittent assertion failure caused by dirty_data_key_count_ underflow. It is concise, specific, and clearly communicates the primary objective of the PR.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/backfill-dirty-key-count-underflow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…erflow Update data_substrate submodule to include the fix for an intermittent assertion failure in AdjustDataKeyStats where dirty_data_key_count_ underflows during cluster scale-out. See eloqdata/tx_service#465 for details.

Bug #3 (root cause of assertion crash): CleanBucketData/CleanRangeData could free CcEntries with BeingCkpt=true, causing dirty count double-decrement when the checkpoint callback (UpdateCceCkptTsCc) later runs on the freed entry. Fix: CanBeCleaned() now returns !GetBeingCkpt() for CleanBucketData, CleanRangeData, and CleanRangeDataForMigration. Entries being checkpointed are skipped and retried later. Bug #1: TemplateCcMap::BackFill called SetCkptTs() before SetCommitTsPayloadStatus(), which overwrites commit_ts_and_status_ and clears the flush bit, leaving the entry dirty without incrementing the counter. Also missing OnCommittedUpdate in ReadOutsideCc backfill path. Fix: Reorder to SetCommitTsPayloadStatus first, then SetCkptTs, and add OnCommittedUpdate in both BackFill and ReadOutsideCc paths. Bug #2: ClusterConfigCcMap called SetCommitTsPayloadStatus() at two sites without OnCommittedUpdate(), making entries dirty without counting them. Fix: Add OnCommittedUpdate after both SetCommitTsPayloadStatus calls. Also relax UpdateCceCkptTsCc assertions to allow IsPersistent() being true, since concurrent BackFill/ReadOutsideCc can legitimately mark an entry persistent before the checkpoint callback runs.

…erflow Update data_substrate submodule to include the fix for an intermittent assertion failure in AdjustDataKeyStats where dirty_data_key_count_ underflows during cluster scale-out. See eloqdata/tx_service#465 for details.

Two distinct bugs were chained under the quarantined LargeObjLRU-Test (the FastMetaDataMutex tls_shard_idx spin was fixed earlier in this branch): 1. Engine bug (template_cc_map.h, FindEmplace, LO_LRU policy): when the large-object page is the LAST page in the map and a key greater than it is inserted, the `target_it == ccmp_.end()` branch creates the new next-page but never repoints `target_page` to it (the parallel `else` branch does). The key is then Emplaced back into the large-object page, and TryUpdatePageKey calls FirstKey() on the still-empty new page -> assert(!keys_.empty()) (TC-FE-01). In NDEBUG this is worse: FirstKey() reads keys_.front() on an empty vector (UB) and the large-object "alone on its page" invariant is violated. Fix: add the missing `target_page = target_it->second.get();`, mirroring the else branch. 2. Test-setup bug (LargeObjLRU-Test.cpp, 4 sites): tests create partial- commit dirty entries and compensated dirty_data_key_count_ via f.shard.AdjustDataKeyStats(...) -- the SHARD counter only. OnCommittedUpdate normally bumps both the shard and the MAP counters, so the map's dirty_data_key_count_ was left at 0 while the entries are dirty, and Terminate()'s decrement underflowed it (assert in TemplateCcMap::AdjustDataKeyStats). This surfaced only once bug #1 stopped killing the suite earlier. Fix: drive the real clean->dirty API (cc_map.OnCommittedUpdate) so both counters stay in step. (This is unrelated to the production dirty-count underflow paths in #465.) With both fixes the full LargeObjLRU-Test (23 cases, 3701 assertions) passes deterministically, so it is un-gated in CI (only ClusterCrossNg-Test remains non-gating). Full ctest suite: 40/40 green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

githubzilla mentioned this pull request Mar 18, 2026

fix: update data_substrate — prevent dirty_data_key_count_ underflow eloqdata/eloqkv#458

Closed

githubzilla force-pushed the fix/backfill-dirty-key-count-underflow branch from 60c8665 to 30351d6 Compare March 21, 2026 02:43

githubzilla changed the title ~~fix: reorder BackFill operations to prevent dirty_data_key_count_ underflow~~ fix: prevent dirty_data_key_count_ underflow from three sources Mar 21, 2026

githubzilla force-pushed the fix/backfill-dirty-key-count-underflow branch from 30351d6 to becb66e Compare March 21, 2026 03:30

githubzilla added 2 commits March 24, 2026 12:42

Update comment

914c061

githubzilla force-pushed the fix/backfill-dirty-key-count-underflow branch from 70ebf56 to 914c061 Compare March 24, 2026 04:42

liunyl approved these changes Jun 15, 2026

View reviewed changes

liunyl merged commit dc37e86 into main Jun 15, 2026
4 checks passed

liunyl deleted the fix/backfill-dirty-key-count-underflow branch June 15, 2026 02:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: prevent dirty_data_key_count_ underflow from three sources#465

fix: prevent dirty_data_key_count_ underflow from three sources#465
liunyl merged 2 commits into
mainfrom
fix/backfill-dirty-key-count-underflow

githubzilla commented Mar 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 18, 2026 •

edited

Loading

Reviews paused

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

githubzilla commented Mar 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bug #3 (confirmed root cause via core dump + diagnostic logs)

Bug #1

Bug #2

Assertion relaxation

Files Changed

Testing

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

githubzilla commented Mar 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 18, 2026 •

edited

Loading