fix lock_hold_tx wlock_ts_ and txm CmdSet's wlock_ts_ by MrGuin · Pull Request #307 · eloqdata/tx_service

MrGuin · 2025-12-18T11:42:35Z

Fixes https://github.com/eloqdata/project_tracker/issues/109.
May also address at least one cause of https://github.com/eloqdata/project_tracker/issues/61.

Summary by CodeRabbit

Bug Fixes
- Prevented incorrect write-lock timestamp updates when the same transaction reacquires a write lock.
- Enforced stricter version checks and non-decreasing timestamp validation when reusing transaction entries to ensure consistency.
Chores
- Added diagnostic logging for transaction initialization errors to aid debugging.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-18T11:42:47Z

Walkthrough

Updated transaction coordination and cache-entry handling: write-lock timestamp update now requires existing timestamp to be zero, AddObjectCommand enforces stricter version and non-decreasing timestamp checks when reusing entries, and ReplayLogCc::Execute now logs InitCcm errors before existing error handling.

Changes

Cohort / File(s)	Summary
Write Lock Timestamp Update Logic `tx_service/src/cc/cc_shard.cpp`	In `CcShard::UpsertLockHoldingTx`, updating `wlock_ts_` now only occurs if the existing `wlock_ts_` is zero, preventing updates when the same transaction re-acquires a write lock.
Cache Entry Field Validation `tx_service/include/command_set.h`	In `AddObjectCommand`, when reusing an existing `CcEntry`, require `object_version` to be 0 or equal to incoming `cce_version`, update `object_version`, and assert & assign that `lock_ts_` and `last_vali_ts_` are non-decreasing relative to existing values. Added comment about multi-command updates to the same entry.
Error Logging in Replay `tx_service/include/cc/cc_request.h`	In `ReplayLogCc::Execute`, when `InitCcm` returns an error, log an error message `InitCcm err in ReplayLogCc: <error_message>` before proceeding with existing error handling paths.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Review CcShard::UpsertLockHoldingTx for concurrency implications where legitimate timestamp refreshes might be needed.
Validate AddObjectCommand assertions won't false-positive under concurrent commands operating on the same cache entry; inspect callers that may rely on previous, looser semantics.
Confirm the new log line in ReplayLogCc::Execute is safe in error paths and does not leak sensitive data.

Possibly related PRs

obj_cmd_op read catalog UpsertLockHodlingTxn in debug build #154 — Modifies CcShard::UpsertLockHoldingTx (adds null-check before emplacing cce_ptr); directly related to the write-lock timestamp update changes.

Suggested reviewers

liunyl
thweetkomputer

Poem

🐰 Hopping through locks with careful pace,

I patch the timestamps, tidy the place.
I nudge a log where errors creep,
And guard shared entries while you sleep.
A tiny hop — consistency in grace.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description references issue links but lacks implementation details, testing information, documentation notes, and RFC references required by the template.	Add details about changes made, confirm tests added, document changes, and verify passing the specified test suite per template checklist.
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly describes the main changes: fixes to lock_hold_tx wlock_ts_ and CmdSet's wlock_ts_ handling, which aligns with all three files modified in the changeset.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix_commit_ts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tx_service/include/cc/cc_request.h (1)

4726-4745: Good addition of error logging; consider adding more context.

The new LOG(ERROR) on InitCcm failure in ReplayLogCc::Execute is helpful and does not change control flow. For easier debugging of replay issues, you might also log the table name and node_group_id_ (and possibly ng_term_) so operators can correlate errors to specific shards/tables.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f74a44e and 8bba05b.

📒 Files selected for processing (3)

tx_service/include/cc/cc_request.h (1 hunks)
tx_service/include/command_set.h (1 hunks)
tx_service/src/cc/cc_shard.cpp (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-10-09T03:56:58.811Z

Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.

Applied to files:

tx_service/src/cc/cc_shard.cpp

📚 Learning: 2025-12-02T10:43:27.431Z

Learnt from: lokax
Repo: eloqdata/tx_service PR: 254
File: tx_service/src/cc/local_cc_shards.cpp:2949-3188
Timestamp: 2025-12-02T10:43:27.431Z
Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.

Applied to files:

tx_service/src/cc/cc_shard.cpp

🔇 Additional comments (2)

tx_service/include/command_set.h (1)

94-103: Updating entry timestamps on reuse looks correct; verify monotonicity assumptions.

Updating object_version_, lock_ts_, and last_vali_ts_ when an existing CmdSetEntry is reused fixes the stale-state issue and keeps the command set in sync with the latest lock state. The added assertions that cce_version, lock_ts, and last_vali_ts are non-decreasing are reasonable invariants for version/timestamp fields.

It would be good to double-check that all call sites (including recovery/replay paths and any lock re-acquisition/upgrade flows) never legitimately pass a lower lock_ts or last_vali_ts; otherwise these debug-only asserts could start firing even though production behavior is fine.

tx_service/src/cc/cc_shard.cpp (1)

775-783: Initialization of wlock_ts_ is properly handled — code change is correct.

The TxLockInfo constructor (line 133) and Reset method (line 154) both correctly initialize wlock_ts_ to 0. The change ensures that wlock_ts_ captures the timestamp of the first write lock acquisition and remains unchanged on subsequent acquisitions by the same transaction. This preserves the initial write lock timestamp needed for checkpoint timestamp calculations (as confirmed by usage at lines 535-537).

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tx_service/include/command_set.h (1)
95-103: Consider using always-on runtime checks instead of assert() for these critical invariants.

The invariants being enforced here (version consistency and non-decreasing timestamps) are important for correctness—this is a bug fix PR after all. However, assert() is typically compiled out in release builds when NDEBUG is defined. If these invariants are violated in production, the code will silently proceed with potentially incorrect values, which could lead to data corruption in transaction coordination.

Consider using a runtime check that logs and handles the failure, or a macro that remains active in release builds:
🔎 Suggested approach using runtime checks
         CmdSetEntry &entry = cce_it->second;
         // Update the entry fields since a txn might have many commands that
         // operate on the same cce.
-        assert(entry.object_version_ == 0 ||
-               entry.object_version_ == cce_version);
+        if (entry.object_version_ != 0 && entry.object_version_ != cce_version)
+        {
+            LOG(ERROR) << "AddObjectCommand version mismatch: existing="
+                       << entry.object_version_ << " incoming=" << cce_version;
+            // Handle error appropriately - return early, throw, or abort
+        }
         entry.object_version_ = cce_version;
-        assert(lock_ts >= entry.lock_ts_);
+        if (lock_ts < entry.lock_ts_)
+        {
+            LOG(ERROR) << "AddObjectCommand lock_ts regressed: existing="
+                       << entry.lock_ts_ << " incoming=" << lock_ts;
+        }
         entry.lock_ts_ = lock_ts;
-        assert(last_vali_ts >= entry.last_vali_ts_);
+        if (last_vali_ts < entry.last_vali_ts_)
+        {
+            LOG(ERROR) << "AddObjectCommand last_vali_ts regressed: existing="
+                       << entry.last_vali_ts_ << " incoming=" << last_vali_ts;
+        }
         entry.last_vali_ts_ = last_vali_ts;
Alternatively, if crashing is acceptable for invariant violations, use a custom macro like CHECK() (from butil/logging.h) that remains active in release builds.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8bba05b and 8985bc9.

📒 Files selected for processing (1)

tx_service/include/command_set.h (1 hunks)

fix lock_hold_tx wlock_ts_ and txm CmdSet's wlock_ts_

8bba05b

coderabbitai Bot reviewed Dec 18, 2025

View reviewed changes

MrGuin requested a review from liunyl December 19, 2025 04:57

fix

8985bc9

coderabbitai Bot reviewed Dec 19, 2025

View reviewed changes

liunyl approved these changes Dec 19, 2025

View reviewed changes

MrGuin merged commit 31168cc into main Dec 19, 2025
4 checks passed

MrGuin deleted the fix_commit_ts branch December 19, 2025 08:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix lock_hold_tx wlock_ts_ and txm CmdSet's wlock_ts_#307

fix lock_hold_tx wlock_ts_ and txm CmdSet's wlock_ts_#307
MrGuin merged 2 commits into
mainfrom
fix_commit_ts

MrGuin commented Dec 18, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Dec 18, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

MrGuin commented Dec 18, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MrGuin commented Dec 18, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Dec 18, 2025 •

edited

Loading