obj_cmd_op read catalog UpsertLockHodlingTxn in debug build by MrGuin · Pull Request #154 · eloqdata/tx_service

MrGuin · 2025-10-09T10:06:13Z

Here are some reminders before you submit the pull request

Add tests for the change
Document changes
Reference the link of issue using fixes eloqdb/tx_service#issue_id
Reference the link of RFC if exists
Pass ./mtr --suite=mono_main,mono_multi,mono_basic

Summary by CodeRabbit

Bug Fixes
- Prevented insertion of null internal entries to reduce rare crashes and undefined behavior.
Refactor
- Made catalog read lock release transaction-aware via a public API update to improve correctness and traceability.
Chores
- Added debug-only instrumentation and expanded debug logging around locks, blocking/resume events, and buffered-command handling to aid troubleshooting without affecting production.

coderabbitai · 2025-10-09T10:06:40Z

Walkthrough

Propagates TxNumber through catalog read release APIs and call sites, adds debug-only instrumentation to record/delete transactions holding read locks, introduces extra debug/info logging in acquire paths, and prevents inserting null pointers into CcShard's cce_list_.

Changes

Cohort / File(s)	Summary of changes
API signature updates (headers) `include/cc/local_cc_handler.h`, `include/cc/non_blocking_lock.h`	`LocalCcHandler::ReleaseCatalogRead` and `NonBlockingLock::ReleaseReadLockFast` now accept an additional `TxNumber tx_number` parameter.
Call-site propagation `src/tx_execution.cpp`, `src/cc/local_cc_handler.cpp`	Compute/preserve `tx_number` and pass it into `LocalCcHandler::ReleaseCatalogRead(..., tx_number)` and onward to `ReleaseReadLockFast`.
Debug-only lock tracking `include/cc/catalog_cc_map.h`, `src/cc/non_blocking_lock.cpp`	Under `#ifndef NDEBUG`, call `UpsertLockHoldingTx` when acquiring and `DeleteLockHoldingTx` when releasing to track the transaction holding the lock.
Logging / observability `include/cc/template_cc_map.h`, `include/cc/object_cc_map.h`, `include/cc/non_blocking_lock.h`	Added debug/info logs for resumed/blocked acquisitions, buffered-command details (including cce key), and a DLOG in read-conflict path.
Safety fix in shard list handling `src/cc/cc_shard.cpp`	Guarded `cce_list_.emplace_back(cce_ptr)` with a null check to avoid inserting `nullptr`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Tx as Transaction
  participant LCH as LocalCcHandler
  participant NBL as NonBlockingLock
  participant Shard as CcShard

  rect rgba(230,240,255,0.35)
  note over Tx,Shard: Catalog read acquire (existing)
  Tx->>LCH: AcquireCatalogRead(...)
  LCH->>NBL: AcquireReadLock(...)
  NBL-->>LCH: Lock acquired
  alt Debug build
    LCH->>Shard: UpsertLockHoldingTx(tx_number,...)
  end
  end

  rect rgba(230,255,230,0.35)
  note over Tx,Shard: Catalog read release (propagated tx_number)
  Tx->>LCH: ReleaseCatalogRead(lock, tx_number)
  LCH->>NBL: ReleaseReadLockFast(Shard, tx_number)
  alt Debug build
    NBL->>Shard: DeleteLockHoldingTx(tx_number,...)
  end
  NBL-->>LCH: Released / may unblock
  LCH-->>Tx: Result
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

liunyl

Poem

I hop through debug tunnels bright,
I note each tx that holds the light.
I skip null paws from the cce trail,
Log whispered hops where tests prevail.
A crunchy carrot for each byte. 🐇🥕

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description only repeats the repository’s reminder checklist and does not include any summary of the changes, motivation, implementation details, tests, documentation updates, or references to related issues or RFCs.	Please complete the description by summarizing the changes made, explaining why they were introduced, linking any relevant issue or RFC, listing testing steps or test suite results, and noting any documentation updates.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title Check	❓ Inconclusive	The title “obj_cmd_op read catalog UpsertLockHodlingTxn in debug build” is unclear and contains a typo, uses internal shorthand, and does not concisely describe the primary change of adding debug-only instrumentation to record lock-holding transactions in the catalog read path.	Please revise the title to a clear, concise summary of the main change, for example “Record transaction holding catalog read lock in debug builds” or “Add debug instrumentation for catalog read lock-holding transactions.”

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch track_catalog_read

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

include/cc/catalog_cc_map.h (1)

2367-2374: Derive node group from tx_number to match Delete path; avoid stale debug entries.

DeleteLockHoldingTx is called with cc_ng_id derived from tx_number ((tx_number >> 32) >> 10). Here Upsert uses node_group_id, which can differ. For symmetry and to prevent leaks in debug tracking, derive cc_ng_id from tx_number.

Apply:

 #ifndef NDEBUG
-            shard_->UpsertLockHoldingTx(tx_number,
-                                        ng_term,
-                                        nullptr,
-                                        false,
-                                        node_group_id,
-                                        table_name_.Type());
+            const NodeGroupId tx_ng =
+                static_cast<NodeGroupId>(((tx_number >> 32L) >> 10));
+            shard_->UpsertLockHoldingTx(tx_number,
+                                        ng_term,   // tx_term unused for reads
+                                        nullptr,
+                                        false,
+                                        tx_ng,
+                                        table_name_.Type());
 #endif

Also confirm ng_term is the intended second parameter for reads (tx_coord_term is only used for write recovery).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 055e500 and 9cd39ad.

📒 Files selected for processing (7)

include/cc/catalog_cc_map.h (1 hunks)
include/cc/local_cc_handler.h (1 hunks)
include/cc/non_blocking_lock.h (1 hunks)
src/cc/cc_shard.cpp (1 hunks)
src/cc/local_cc_handler.cpp (1 hunks)
src/cc/non_blocking_lock.cpp (1 hunks)
src/tx_execution.cpp (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (5)

include/cc/local_cc_handler.h (1)

src/cc/local_cc_handler.cpp (2)

ReleaseCatalogRead (677-681)

ReleaseCatalogRead (677-677)

include/cc/non_blocking_lock.h (1)

src/cc/non_blocking_lock.cpp (2)

ReleaseReadLockFast (276-289)

ReleaseReadLockFast (276-276)

src/cc/non_blocking_lock.cpp (1)

include/cc/non_blocking_lock.h (1)

read_cnt_ (356-356)

include/cc/catalog_cc_map.h (3)

include/tx_operation.h (2)

table_name_ (151-152)

table_name_ (1065-1065)

include/tx_request.h (3)

table_name_ (750-1066)

table_name_ (780-781)

table_name_ (800-910)

include/remote/remote_cc_request.h (1)

table_name_ (808-809)

src/tx_execution.cpp (1)

include/tx_execution.h (1)

TxNumber (211-228)

🔇 Additional comments (6)

src/cc/local_cc_handler.cpp (1)

677-681: LGTM: Clean tx_number propagation for catalog read lock release.

The method signature change correctly adds the tx_number parameter and forwards it to ReleaseReadLockFast, enabling debug-only tracking of which transaction is releasing the catalog read lock.

src/cc/non_blocking_lock.cpp (1)

276-289: LGTM: Debug instrumentation correctly tracks catalog read lock releases.

The changes properly add debug-only tracking of catalog read lock releases by transaction number:

The signature change is minimal and correct.

The debug code is properly guarded with #ifndef NDEBUG, so production builds are unaffected.

The bit arithmetic (tx_number >> 32L) >> 10 correctly extracts the node group ID (consistent with the pattern used elsewhere, e.g., line 817 in local_cc_handler.cpp).

The nullptr parameter is appropriate for catalog reads, which use the fast path without a regular CcEntry.

src/tx_execution.cpp (1)

318-326: Scoped tx_number capture looks good.

Grabbing tx_number once avoids repeated atomic loads and safely propagates the context to ReleaseCatalogRead. No issues spotted.

src/cc/cc_shard.cpp (1)

735-738: Good fix: avoid inserting nullptr into cce_list_.

Prevents invalid set entries and aligns with DeleteLockHoldingTx(cce_ptr) semantics.

include/cc/non_blocking_lock.h (1)

134-134: ReleaseReadLockFast signature usages verified
LocalCcHandler::ReleaseCatalogRead correctly passes tx_number; no missing invocations found.

include/cc/local_cc_handler.h (1)

175-175: ReleaseCatalogRead callers pass tx_number All call sites now pass tx_number and the implementation correctly forwards it to ReleaseReadLockFast.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

include/cc/local_cc_handler.h (1)
175-175: Consider adding parameter documentation.

The new tx_number parameter lacks documentation explaining its purpose. Consider adding a comment to clarify that it identifies the transaction releasing the catalog read lock (used for debug instrumentation).

Example documentation addition above the function declaration:
/// <summary>
/// Releases a catalog read lock.
/// </summary>
/// <param name="lock">The lock to release</param>
/// <param name="tx_number">Transaction number for debug tracking</param>
/// <returns>True if the lock was successfully released</returns>
bool ReleaseCatalogRead(NonBlockingLock *lock, TxNumber tx_number) const;

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 29ba1b9 and bdaf8ca.

📒 Files selected for processing (9)

include/cc/catalog_cc_map.h (1 hunks)
include/cc/local_cc_handler.h (1 hunks)
include/cc/non_blocking_lock.h (2 hunks)
include/cc/object_cc_map.h (1 hunks)
include/cc/template_cc_map.h (3 hunks)
src/cc/cc_shard.cpp (1 hunks)
src/cc/local_cc_handler.cpp (1 hunks)
src/cc/non_blocking_lock.cpp (1 hunks)
src/tx_execution.cpp (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (4)

include/cc/non_blocking_lock.h
src/cc/non_blocking_lock.cpp
src/tx_execution.cpp
src/cc/cc_shard.cpp

🧰 Additional context used

🧬 Code graph analysis (2)

include/cc/local_cc_handler.h (1)

src/cc/local_cc_handler.cpp (2)

ReleaseCatalogRead (677-682)

ReleaseCatalogRead (677-678)

include/cc/catalog_cc_map.h (3)

include/tx_operation.h (2)

table_name_ (151-152)

table_name_ (1065-1065)

include/tx_request.h (3)

table_name_ (750-1066)

table_name_ (780-781)

table_name_ (800-910)

include/remote/remote_cc_request.h (1)

table_name_ (808-809)

🔇 Additional comments (4)

include/cc/object_cc_map.h (1)

526-526: LGTM! Helpful debug information added.

Adding the CCE key to the error log provides valuable context for debugging buffered command issues on the leader node. This complements the existing CommitTs in the log output.

src/cc/local_cc_handler.cpp (1)

677-681: LGTM! Parameter propagation is correct.

The signature change correctly adds the TxNumber tx_number parameter and forwards it to ReleaseReadLockFast. The implementation aligns with the header declaration and enables debug-only transaction tracking as intended by this PR.

include/cc/template_cc_map.h (2)

719-723: Good addition: debug resume trace

Adds helpful context (txn, table, key, lock, err) when resuming. Safe as DLOG.

932-935: Good addition: debug blocked trace

Extra DLOG on ACQUIRE_LOCK_BLOCKED will aid diagnosis without runtime cost in release.

coderabbitai · 2025-10-13T07:27:44Z

+#ifndef NDEBUG
+            shard_->UpsertLockHoldingTx(tx_number,
+                                        ng_term,
+                                        nullptr,
+                                        false,
+                                        node_group_id,
+                                        table_name_.Type());
+#endif


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify the table type parameter in debug instrumentation.

The call to UpsertLockHoldingTx passes table_name_.Type(), which returns TableType::Catalog for this catalog CC map. However, the method parameter table_name represents the specific table being read.

Should this be table_name.Type() instead to track the actual table type being locked, or is the intention to record that the catalog map itself was accessed? The current implementation could lead to confusion during debugging if the intent is to track locks on specific table types.

Run the following script to examine how UpsertLockHoldingTx is used elsewhere and understand its expected semantics:

Based on learnings: The PR introduces debug instrumentation to track transactions holding read locks on catalog entries. Clarification is needed on whether tracking should use the catalog's type or the actual table's type.

🏁 Script executed:

#!/bin/bash # Description: Check UpsertLockHoldingTx usage to understand parameter semantics # Search for the function definition to understand parameter meanings ast-grep --pattern 'UpsertLockHoldingTx($$$)' # Search for other call sites to see what table types are passed rg -nP -C3 'UpsertLockHoldingTx\(' --type=cpp

Length of output: 5550

🏁 Script executed:

#!/bin/bash sed -n '2300,2370p' include/cc/catalog_cc_map.h sed -n '430,470p' include/cc/cc_shard.h

Length of output: 4283

Replace table_name_.Type() with table_name.Type() in the UpsertLockHoldingTx call
This ensures debug instrumentation records the actual table’s type rather than the catalog map’s type.
File: include/cc/catalog_cc_map.h:2368

🤖 Prompt for AI Agents

In include/cc/catalog_cc_map.h around lines 2367 to 2374, the call to shard_->UpsertLockHoldingTx currently passes table_name_.Type() which records the catalog map’s type; replace that argument with table_name.Type() so the actual table’s type is recorded. Update the UpsertLockHoldingTx invocation to use table_name.Type(), ensuring the table_name variable is in scope and correctly referenced (no other changes to surrounding logic).

coderabbitai · 2025-10-13T07:27:44Z

                TxNumber tx_number) const;

-    bool ReleaseCatalogRead(NonBlockingLock *lock) const;
+    bool ReleaseCatalogRead(NonBlockingLock *lock, TxNumber tx_number) const;


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify all call sites are updated to pass tx_number.

The signature change is correct and consistent with the implementation. However, as this is a public API change, ensure all callers of ReleaseCatalogRead have been updated to provide the new tx_number parameter.

Run the following script to verify all call sites have been updated:

🏁 Script executed:

#!/bin/bash # Description: Find all calls to ReleaseCatalogRead and verify they pass tx_number # Search for function calls to ReleaseCatalogRead rg -n -C3 '\bReleaseCatalogRead\s*\(' --type=cpp

Length of output: 1396

Verify correct argument types for ReleaseCatalogRead

The only call to ReleaseCatalogRead in src/tx_execution.cpp:325 passes db_idx.first (an integer index), not a NonBlockingLock*. Update this call to use the proper lock pointer (e.g., the first element of your lock container) along with tx_number.

🤖 Prompt for AI Agents

In include/cc/local_cc_handler.h around line 175, ReleaseCatalogRead currently expects a NonBlockingLock* and a TxNumber, but the only caller passes an integer index; update the caller in src/tx_execution.cpp (around line 325) to pass the actual lock pointer (e.g., the first element from the lock container, not db_idx.first) along with tx_number. Ensure you obtain the NonBlockingLock* from your lock container (or lookup by index and take the pointer/reference) and pass that pointer to ReleaseCatalogRead, and adjust any null checks if needed.

coderabbitai · 2025-10-13T07:27:44Z

                                << "Buffered cmds found on leader node"
+                                << ", cce key: " << cce->KeyString()
                                << ", cce CommitTs: " << cce->CommitTs() << "\n"
                                << buffered_cmds;
                            assert(false);


⚠️ Potential issue | 🟠 Major

Potential PII leak: logging full key in non-debug error path

LOG(ERROR) now includes cce->KeyString() which can expose user keys in prod logs. Prefer hashing/redaction or guard by debug flag.

Apply one of the following:

Redact/hash the key:

- LOG(ERROR) - << "Buffered cmds found on leader node" - << ", cce key: " << cce->KeyString() + LOG(ERROR) + << "Buffered cmds found on leader node" + << ", cce key_hash: 0x" << std::hex << key->Hash() << ", cce CommitTs: " << cce->CommitTs() << "\n" << buffered_cmds;

Or gate the key print behind DLOG/trace-only logging:

- LOG(ERROR) - << "Buffered cmds found on leader node" - << ", cce key: " << cce->KeyString() + LOG(ERROR) + << "Buffered cmds found on leader node" + << ", cce CommitTs: " << cce->CommitTs() << "\n" + << buffered_cmds; + DLOG(ERROR) + << "Buffered cmds detail, cce key: " << cce->KeyString();

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In include/cc/template_cc_map.h around lines 6535-6539, the error log prints cce->KeyString() which may expose user keys; replace this with a non-PII value or guard it: either (preferred) compute and log a one-way hash (e.g., SHA256 hex or truncated SHA256) of cce->KeyString() so the original key cannot be recovered, or (alternative) remove the key from LOG(ERROR) and only emit the full key under debug/trace logging (DLOG or guarded by a debug flag). Ensure the chosen hash uses a consistent method across code (and consider a per-instance salt if required), update the log message to remain useful without revealing raw keys, and keep the assert(false) behavior unchanged.

coderabbitai Bot reviewed Oct 9, 2025

View reviewed changes

MrGuin force-pushed the track_catalog_read branch from 20c9f9d to 29ba1b9 Compare October 11, 2025 06:35

MrGuin added 3 commits October 13, 2025 15:19

obj_cmd_op read catalog UpsertLockHodlingTxn in debug build

1a8bb61

add debug log for AcquireAllCc blocking

bf95b37

update buffer command debug log

bdaf8ca

MrGuin force-pushed the track_catalog_read branch from 29ba1b9 to bdaf8ca Compare October 13, 2025 07:19

coderabbitai Bot reviewed Oct 13, 2025

View reviewed changes

thweetkomputer approved these changes Oct 24, 2025

View reviewed changes

MrGuin merged commit 2c913f4 into main Oct 24, 2025
3 of 4 checks passed

coderabbitai Bot mentioned this pull request Dec 3, 2025

remove schema modification in clean_ccm_op_ #236

Merged

coderabbitai Bot mentioned this pull request Dec 18, 2025

fix lock_hold_tx wlock_ts_ and txm CmdSet's wlock_ts_ #307

Merged

coderabbitai Bot mentioned this pull request Feb 26, 2026

ClearTx also clears ReaderWriterCntl #419

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

obj_cmd_op read catalog UpsertLockHodlingTxn in debug build#154

obj_cmd_op read catalog UpsertLockHodlingTxn in debug build#154
MrGuin merged 3 commits into
mainfrom
track_catalog_read

MrGuin commented Oct 9, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Oct 9, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Oct 13, 2025

Uh oh!

coderabbitai Bot Oct 13, 2025

Uh oh!

coderabbitai Bot Oct 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

MrGuin commented Oct 9, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Here are some reminders before you submit the pull request

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MrGuin commented Oct 9, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Oct 9, 2025 •

edited

Loading