Skip to content

fix lock_hold_tx wlock_ts_ and txm CmdSet's wlock_ts_#307

Merged
MrGuin merged 2 commits into
mainfrom
fix_commit_ts
Dec 19, 2025
Merged

fix lock_hold_tx wlock_ts_ and txm CmdSet's wlock_ts_#307
MrGuin merged 2 commits into
mainfrom
fix_commit_ts

Conversation

@MrGuin

@MrGuin MrGuin commented Dec 18, 2025

Copy link
Copy Markdown
Collaborator

Fixes https://github.com/eloqdata/project_tracker/issues/109.
May also address at least one cause of https://github.com/eloqdata/project_tracker/issues/61.

Summary by CodeRabbit

  • Bug Fixes

    • Prevented incorrect write-lock timestamp updates when the same transaction reacquires a write lock.
    • Enforced stricter version checks and non-decreasing timestamp validation when reusing transaction entries to ensure consistency.
  • Chores

    • Added diagnostic logging for transaction initialization errors to aid debugging.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai

coderabbitai Bot commented Dec 18, 2025

Copy link
Copy Markdown

Walkthrough

Updated transaction coordination and cache-entry handling: write-lock timestamp update now requires existing timestamp to be zero, AddObjectCommand enforces stricter version and non-decreasing timestamp checks when reusing entries, and ReplayLogCc::Execute now logs InitCcm errors before existing error handling.

Changes

Cohort / File(s) Summary
Write Lock Timestamp Update Logic
tx_service/src/cc/cc_shard.cpp
In CcShard::UpsertLockHoldingTx, updating wlock_ts_ now only occurs if the existing wlock_ts_ is zero, preventing updates when the same transaction re-acquires a write lock.
Cache Entry Field Validation
tx_service/include/command_set.h
In AddObjectCommand, when reusing an existing CcEntry, require object_version to be 0 or equal to incoming cce_version, update object_version, and assert & assign that lock_ts_ and last_vali_ts_ are non-decreasing relative to existing values. Added comment about multi-command updates to the same entry.
Error Logging in Replay
tx_service/include/cc/cc_request.h
In ReplayLogCc::Execute, when InitCcm returns an error, log an error message InitCcm err in ReplayLogCc: <error_message> before proceeding with existing error handling paths.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review CcShard::UpsertLockHoldingTx for concurrency implications where legitimate timestamp refreshes might be needed.
  • Validate AddObjectCommand assertions won't false-positive under concurrent commands operating on the same cache entry; inspect callers that may rely on previous, looser semantics.
  • Confirm the new log line in ReplayLogCc::Execute is safe in error paths and does not leak sensitive data.

Possibly related PRs

Suggested reviewers

  • liunyl
  • thweetkomputer

Poem

🐰 Hopping through locks with careful pace,

I patch the timestamps, tidy the place.
I nudge a log where errors creep,
And guard shared entries while you sleep.
A tiny hop — consistency in grace.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description references issue links but lacks implementation details, testing information, documentation notes, and RFC references required by the template. Add details about changes made, confirm tests added, document changes, and verify passing the specified test suite per template checklist.
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main changes: fixes to lock_hold_tx wlock_ts_ and CmdSet's wlock_ts_ handling, which aligns with all three files modified in the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix_commit_ts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tx_service/include/cc/cc_request.h (1)

4726-4745: Good addition of error logging; consider adding more context.

The new LOG(ERROR) on InitCcm failure in ReplayLogCc::Execute is helpful and does not change control flow. For easier debugging of replay issues, you might also log the table name and node_group_id_ (and possibly ng_term_) so operators can correlate errors to specific shards/tables.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f74a44e and 8bba05b.

📒 Files selected for processing (3)
  • tx_service/include/cc/cc_request.h (1 hunks)
  • tx_service/include/command_set.h (1 hunks)
  • tx_service/src/cc/cc_shard.cpp (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.

Applied to files:

  • tx_service/src/cc/cc_shard.cpp
📚 Learning: 2025-12-02T10:43:27.431Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 254
File: tx_service/src/cc/local_cc_shards.cpp:2949-3188
Timestamp: 2025-12-02T10:43:27.431Z
Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.

Applied to files:

  • tx_service/src/cc/cc_shard.cpp
🔇 Additional comments (2)
tx_service/include/command_set.h (1)

94-103: Updating entry timestamps on reuse looks correct; verify monotonicity assumptions.

Updating object_version_, lock_ts_, and last_vali_ts_ when an existing CmdSetEntry is reused fixes the stale-state issue and keeps the command set in sync with the latest lock state. The added assertions that cce_version, lock_ts, and last_vali_ts are non-decreasing are reasonable invariants for version/timestamp fields.

It would be good to double-check that all call sites (including recovery/replay paths and any lock re-acquisition/upgrade flows) never legitimately pass a lower lock_ts or last_vali_ts; otherwise these debug-only asserts could start firing even though production behavior is fine.

tx_service/src/cc/cc_shard.cpp (1)

775-783: Initialization of wlock_ts_ is properly handled — code change is correct.

The TxLockInfo constructor (line 133) and Reset method (line 154) both correctly initialize wlock_ts_ to 0. The change ensures that wlock_ts_ captures the timestamp of the first write lock acquisition and remains unchanged on subsequent acquisitions by the same transaction. This preserves the initial write lock timestamp needed for checkpoint timestamp calculations (as confirmed by usage at lines 535-537).

@MrGuin MrGuin requested a review from liunyl December 19, 2025 04:57

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tx_service/include/command_set.h (1)

95-103: Consider using always-on runtime checks instead of assert() for these critical invariants.

The invariants being enforced here (version consistency and non-decreasing timestamps) are important for correctness—this is a bug fix PR after all. However, assert() is typically compiled out in release builds when NDEBUG is defined. If these invariants are violated in production, the code will silently proceed with potentially incorrect values, which could lead to data corruption in transaction coordination.

Consider using a runtime check that logs and handles the failure, or a macro that remains active in release builds:

🔎 Suggested approach using runtime checks
         CmdSetEntry &entry = cce_it->second;
         // Update the entry fields since a txn might have many commands that
         // operate on the same cce.
-        assert(entry.object_version_ == 0 ||
-               entry.object_version_ == cce_version);
+        if (entry.object_version_ != 0 && entry.object_version_ != cce_version)
+        {
+            LOG(ERROR) << "AddObjectCommand version mismatch: existing="
+                       << entry.object_version_ << " incoming=" << cce_version;
+            // Handle error appropriately - return early, throw, or abort
+        }
         entry.object_version_ = cce_version;
-        assert(lock_ts >= entry.lock_ts_);
+        if (lock_ts < entry.lock_ts_)
+        {
+            LOG(ERROR) << "AddObjectCommand lock_ts regressed: existing="
+                       << entry.lock_ts_ << " incoming=" << lock_ts;
+        }
         entry.lock_ts_ = lock_ts;
-        assert(last_vali_ts >= entry.last_vali_ts_);
+        if (last_vali_ts < entry.last_vali_ts_)
+        {
+            LOG(ERROR) << "AddObjectCommand last_vali_ts regressed: existing="
+                       << entry.last_vali_ts_ << " incoming=" << last_vali_ts;
+        }
         entry.last_vali_ts_ = last_vali_ts;

Alternatively, if crashing is acceptable for invariant violations, use a custom macro like CHECK() (from butil/logging.h) that remains active in release builds.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8bba05b and 8985bc9.

📒 Files selected for processing (1)
  • tx_service/include/command_set.h (1 hunks)

@MrGuin MrGuin merged commit 31168cc into main Dec 19, 2025
4 checks passed
@MrGuin MrGuin deleted the fix_commit_ts branch December 19, 2025 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants