Skip to content

obj_cmd_op read catalog UpsertLockHodlingTxn in debug build#154

Merged
MrGuin merged 3 commits into
mainfrom
track_catalog_read
Oct 24, 2025
Merged

obj_cmd_op read catalog UpsertLockHodlingTxn in debug build#154
MrGuin merged 3 commits into
mainfrom
track_catalog_read

Conversation

@MrGuin

@MrGuin MrGuin commented Oct 9, 2025

Copy link
Copy Markdown
Collaborator

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Reference the link of issue using fixes eloqdb/tx_service#issue_id
  • Reference the link of RFC if exists
  • Pass ./mtr --suite=mono_main,mono_multi,mono_basic

Summary by CodeRabbit

  • Bug Fixes

    • Prevented insertion of null internal entries to reduce rare crashes and undefined behavior.
  • Refactor

    • Made catalog read lock release transaction-aware via a public API update to improve correctness and traceability.
  • Chores

    • Added debug-only instrumentation and expanded debug logging around locks, blocking/resume events, and buffered-command handling to aid troubleshooting without affecting production.

@coderabbitai

coderabbitai Bot commented Oct 9, 2025

Copy link
Copy Markdown

Walkthrough

Propagates TxNumber through catalog read release APIs and call sites, adds debug-only instrumentation to record/delete transactions holding read locks, introduces extra debug/info logging in acquire paths, and prevents inserting null pointers into CcShard's cce_list_.

Changes

Cohort / File(s) Summary of changes
API signature updates (headers)
include/cc/local_cc_handler.h, include/cc/non_blocking_lock.h
LocalCcHandler::ReleaseCatalogRead and NonBlockingLock::ReleaseReadLockFast now accept an additional TxNumber tx_number parameter.
Call-site propagation
src/tx_execution.cpp, src/cc/local_cc_handler.cpp
Compute/preserve tx_number and pass it into LocalCcHandler::ReleaseCatalogRead(..., tx_number) and onward to ReleaseReadLockFast.
Debug-only lock tracking
include/cc/catalog_cc_map.h, src/cc/non_blocking_lock.cpp
Under #ifndef NDEBUG, call UpsertLockHoldingTx when acquiring and DeleteLockHoldingTx when releasing to track the transaction holding the lock.
Logging / observability
include/cc/template_cc_map.h, include/cc/object_cc_map.h, include/cc/non_blocking_lock.h
Added debug/info logs for resumed/blocked acquisitions, buffered-command details (including cce key), and a DLOG in read-conflict path.
Safety fix in shard list handling
src/cc/cc_shard.cpp
Guarded cce_list_.emplace_back(cce_ptr) with a null check to avoid inserting nullptr.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Tx as Transaction
  participant LCH as LocalCcHandler
  participant NBL as NonBlockingLock
  participant Shard as CcShard

  rect rgba(230,240,255,0.35)
  note over Tx,Shard: Catalog read acquire (existing)
  Tx->>LCH: AcquireCatalogRead(...)
  LCH->>NBL: AcquireReadLock(...)
  NBL-->>LCH: Lock acquired
  alt Debug build
    LCH->>Shard: UpsertLockHoldingTx(tx_number,...)
  end
  end

  rect rgba(230,255,230,0.35)
  note over Tx,Shard: Catalog read release (propagated tx_number)
  Tx->>LCH: ReleaseCatalogRead(lock, tx_number)
  LCH->>NBL: ReleaseReadLockFast(Shard, tx_number)
  alt Debug build
    NBL->>Shard: DeleteLockHoldingTx(tx_number,...)
  end
  NBL-->>LCH: Released / may unblock
  LCH-->>Tx: Result
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • liunyl

Poem

I hop through debug tunnels bright,
I note each tx that holds the light.
I skip null paws from the cce trail,
Log whispered hops where tests prevail.
A crunchy carrot for each byte. 🐇🥕

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description only repeats the repository’s reminder checklist and does not include any summary of the changes, motivation, implementation details, tests, documentation updates, or references to related issues or RFCs. Please complete the description by summarizing the changes made, explaining why they were introduced, linking any relevant issue or RFC, listing testing steps or test suite results, and noting any documentation updates.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title Check ❓ Inconclusive The title “obj_cmd_op read catalog UpsertLockHodlingTxn in debug build” is unclear and contains a typo, uses internal shorthand, and does not concisely describe the primary change of adding debug-only instrumentation to record lock-holding transactions in the catalog read path. Please revise the title to a clear, concise summary of the main change, for example “Record transaction holding catalog read lock in debug builds” or “Add debug instrumentation for catalog read lock-holding transactions.”
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch track_catalog_read

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
include/cc/catalog_cc_map.h (1)

2367-2374: Derive node group from tx_number to match Delete path; avoid stale debug entries.

DeleteLockHoldingTx is called with cc_ng_id derived from tx_number ((tx_number >> 32) >> 10). Here Upsert uses node_group_id, which can differ. For symmetry and to prevent leaks in debug tracking, derive cc_ng_id from tx_number.

Apply:

 #ifndef NDEBUG
-            shard_->UpsertLockHoldingTx(tx_number,
-                                        ng_term,
-                                        nullptr,
-                                        false,
-                                        node_group_id,
-                                        table_name_.Type());
+            const NodeGroupId tx_ng =
+                static_cast<NodeGroupId>(((tx_number >> 32L) >> 10));
+            shard_->UpsertLockHoldingTx(tx_number,
+                                        ng_term,   // tx_term unused for reads
+                                        nullptr,
+                                        false,
+                                        tx_ng,
+                                        table_name_.Type());
 #endif

Also confirm ng_term is the intended second parameter for reads (tx_coord_term is only used for write recovery).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 055e500 and 9cd39ad.

📒 Files selected for processing (7)
  • include/cc/catalog_cc_map.h (1 hunks)
  • include/cc/local_cc_handler.h (1 hunks)
  • include/cc/non_blocking_lock.h (1 hunks)
  • src/cc/cc_shard.cpp (1 hunks)
  • src/cc/local_cc_handler.cpp (1 hunks)
  • src/cc/non_blocking_lock.cpp (1 hunks)
  • src/tx_execution.cpp (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
include/cc/local_cc_handler.h (1)
src/cc/local_cc_handler.cpp (2)
  • ReleaseCatalogRead (677-681)
  • ReleaseCatalogRead (677-677)
include/cc/non_blocking_lock.h (1)
src/cc/non_blocking_lock.cpp (2)
  • ReleaseReadLockFast (276-289)
  • ReleaseReadLockFast (276-276)
src/cc/non_blocking_lock.cpp (1)
include/cc/non_blocking_lock.h (1)
  • read_cnt_ (356-356)
include/cc/catalog_cc_map.h (3)
include/tx_operation.h (2)
  • table_name_ (151-152)
  • table_name_ (1065-1065)
include/tx_request.h (3)
  • table_name_ (750-1066)
  • table_name_ (780-781)
  • table_name_ (800-910)
include/remote/remote_cc_request.h (1)
  • table_name_ (808-809)
src/tx_execution.cpp (1)
include/tx_execution.h (1)
  • TxNumber (211-228)
🔇 Additional comments (6)
src/cc/local_cc_handler.cpp (1)

677-681: LGTM: Clean tx_number propagation for catalog read lock release.

The method signature change correctly adds the tx_number parameter and forwards it to ReleaseReadLockFast, enabling debug-only tracking of which transaction is releasing the catalog read lock.

src/cc/non_blocking_lock.cpp (1)

276-289: LGTM: Debug instrumentation correctly tracks catalog read lock releases.

The changes properly add debug-only tracking of catalog read lock releases by transaction number:

  • The signature change is minimal and correct.
  • The debug code is properly guarded with #ifndef NDEBUG, so production builds are unaffected.
  • The bit arithmetic (tx_number >> 32L) >> 10 correctly extracts the node group ID (consistent with the pattern used elsewhere, e.g., line 817 in local_cc_handler.cpp).
  • The nullptr parameter is appropriate for catalog reads, which use the fast path without a regular CcEntry.
src/tx_execution.cpp (1)

318-326: Scoped tx_number capture looks good.

Grabbing tx_number once avoids repeated atomic loads and safely propagates the context to ReleaseCatalogRead. No issues spotted.

src/cc/cc_shard.cpp (1)

735-738: Good fix: avoid inserting nullptr into cce_list_.

Prevents invalid set entries and aligns with DeleteLockHoldingTx(cce_ptr) semantics.

include/cc/non_blocking_lock.h (1)

134-134: ReleaseReadLockFast signature usages verified
LocalCcHandler::ReleaseCatalogRead correctly passes tx_number; no missing invocations found.

include/cc/local_cc_handler.h (1)

175-175: ReleaseCatalogRead callers pass tx_number All call sites now pass tx_number and the implementation correctly forwards it to ReleaseReadLockFast.

@MrGuin MrGuin force-pushed the track_catalog_read branch from 20c9f9d to 29ba1b9 Compare October 11, 2025 06:35
@MrGuin MrGuin force-pushed the track_catalog_read branch from 29ba1b9 to bdaf8ca Compare October 13, 2025 07:19

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
include/cc/local_cc_handler.h (1)

175-175: Consider adding parameter documentation.

The new tx_number parameter lacks documentation explaining its purpose. Consider adding a comment to clarify that it identifies the transaction releasing the catalog read lock (used for debug instrumentation).

Example documentation addition above the function declaration:

/// <summary>
/// Releases a catalog read lock.
/// </summary>
/// <param name="lock">The lock to release</param>
/// <param name="tx_number">Transaction number for debug tracking</param>
/// <returns>True if the lock was successfully released</returns>
bool ReleaseCatalogRead(NonBlockingLock *lock, TxNumber tx_number) const;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 29ba1b9 and bdaf8ca.

📒 Files selected for processing (9)
  • include/cc/catalog_cc_map.h (1 hunks)
  • include/cc/local_cc_handler.h (1 hunks)
  • include/cc/non_blocking_lock.h (2 hunks)
  • include/cc/object_cc_map.h (1 hunks)
  • include/cc/template_cc_map.h (3 hunks)
  • src/cc/cc_shard.cpp (1 hunks)
  • src/cc/local_cc_handler.cpp (1 hunks)
  • src/cc/non_blocking_lock.cpp (1 hunks)
  • src/tx_execution.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • include/cc/non_blocking_lock.h
  • src/cc/non_blocking_lock.cpp
  • src/tx_execution.cpp
  • src/cc/cc_shard.cpp
🧰 Additional context used
🧬 Code graph analysis (2)
include/cc/local_cc_handler.h (1)
src/cc/local_cc_handler.cpp (2)
  • ReleaseCatalogRead (677-682)
  • ReleaseCatalogRead (677-678)
include/cc/catalog_cc_map.h (3)
include/tx_operation.h (2)
  • table_name_ (151-152)
  • table_name_ (1065-1065)
include/tx_request.h (3)
  • table_name_ (750-1066)
  • table_name_ (780-781)
  • table_name_ (800-910)
include/remote/remote_cc_request.h (1)
  • table_name_ (808-809)
🔇 Additional comments (4)
include/cc/object_cc_map.h (1)

526-526: LGTM! Helpful debug information added.

Adding the CCE key to the error log provides valuable context for debugging buffered command issues on the leader node. This complements the existing CommitTs in the log output.

src/cc/local_cc_handler.cpp (1)

677-681: LGTM! Parameter propagation is correct.

The signature change correctly adds the TxNumber tx_number parameter and forwards it to ReleaseReadLockFast. The implementation aligns with the header declaration and enables debug-only transaction tracking as intended by this PR.

include/cc/template_cc_map.h (2)

719-723: Good addition: debug resume trace

Adds helpful context (txn, table, key, lock, err) when resuming. Safe as DLOG.


932-935: Good addition: debug blocked trace

Extra DLOG on ACQUIRE_LOCK_BLOCKED will aid diagnosis without runtime cost in release.

Comment on lines +2367 to +2374
#ifndef NDEBUG
shard_->UpsertLockHoldingTx(tx_number,
ng_term,
nullptr,
false,
node_group_id,
table_name_.Type());
#endif

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify the table type parameter in debug instrumentation.

The call to UpsertLockHoldingTx passes table_name_.Type(), which returns TableType::Catalog for this catalog CC map. However, the method parameter table_name represents the specific table being read.

Should this be table_name.Type() instead to track the actual table type being locked, or is the intention to record that the catalog map itself was accessed? The current implementation could lead to confusion during debugging if the intent is to track locks on specific table types.

Run the following script to examine how UpsertLockHoldingTx is used elsewhere and understand its expected semantics:

Based on learnings: The PR introduces debug instrumentation to track transactions holding read locks on catalog entries. Clarification is needed on whether tracking should use the catalog's type or the actual table's type.


🏁 Script executed:

#!/bin/bash
# Description: Check UpsertLockHoldingTx usage to understand parameter semantics

# Search for the function definition to understand parameter meanings
ast-grep --pattern 'UpsertLockHoldingTx($$$)'

# Search for other call sites to see what table types are passed
rg -nP -C3 'UpsertLockHoldingTx\(' --type=cpp

Length of output: 5550


🏁 Script executed:

#!/bin/bash
sed -n '2300,2370p' include/cc/catalog_cc_map.h
sed -n '430,470p' include/cc/cc_shard.h

Length of output: 4283


Replace table_name_.Type() with table_name.Type() in the UpsertLockHoldingTx call
This ensures debug instrumentation records the actual table’s type rather than the catalog map’s type.
File: include/cc/catalog_cc_map.h:2368

🤖 Prompt for AI Agents
In include/cc/catalog_cc_map.h around lines 2367 to 2374, the call to
shard_->UpsertLockHoldingTx currently passes table_name_.Type() which records
the catalog map’s type; replace that argument with table_name.Type() so the
actual table’s type is recorded. Update the UpsertLockHoldingTx invocation to
use table_name.Type(), ensuring the table_name variable is in scope and
correctly referenced (no other changes to surrounding logic).

TxNumber tx_number) const;

bool ReleaseCatalogRead(NonBlockingLock *lock) const;
bool ReleaseCatalogRead(NonBlockingLock *lock, TxNumber tx_number) const;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify all call sites are updated to pass tx_number.

The signature change is correct and consistent with the implementation. However, as this is a public API change, ensure all callers of ReleaseCatalogRead have been updated to provide the new tx_number parameter.

Run the following script to verify all call sites have been updated:


🏁 Script executed:

#!/bin/bash
# Description: Find all calls to ReleaseCatalogRead and verify they pass tx_number

# Search for function calls to ReleaseCatalogRead
rg -n -C3 '\bReleaseCatalogRead\s*\(' --type=cpp

Length of output: 1396


Verify correct argument types for ReleaseCatalogRead

The only call to ReleaseCatalogRead in src/tx_execution.cpp:325 passes db_idx.first (an integer index), not a NonBlockingLock*. Update this call to use the proper lock pointer (e.g., the first element of your lock container) along with tx_number.

🤖 Prompt for AI Agents
In include/cc/local_cc_handler.h around line 175, ReleaseCatalogRead currently
expects a NonBlockingLock* and a TxNumber, but the only caller passes an integer
index; update the caller in src/tx_execution.cpp (around line 325) to pass the
actual lock pointer (e.g., the first element from the lock container, not
db_idx.first) along with tx_number. Ensure you obtain the NonBlockingLock* from
your lock container (or lookup by index and take the pointer/reference) and pass
that pointer to ReleaseCatalogRead, and adjust any null checks if needed.

Comment on lines 6535 to 6539
<< "Buffered cmds found on leader node"
<< ", cce key: " << cce->KeyString()
<< ", cce CommitTs: " << cce->CommitTs() << "\n"
<< buffered_cmds;
assert(false);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential PII leak: logging full key in non-debug error path

LOG(ERROR) now includes cce->KeyString() which can expose user keys in prod logs. Prefer hashing/redaction or guard by debug flag.

Apply one of the following:

  • Redact/hash the key:
-                            LOG(ERROR)
-                                << "Buffered cmds found on leader node"
-                                << ", cce key: " << cce->KeyString()
+                            LOG(ERROR)
+                                << "Buffered cmds found on leader node"
+                                << ", cce key_hash: 0x" << std::hex << key->Hash()
                                 << ", cce CommitTs: " << cce->CommitTs() << "\n"
                                 << buffered_cmds;
  • Or gate the key print behind DLOG/trace-only logging:
-                            LOG(ERROR)
-                                << "Buffered cmds found on leader node"
-                                << ", cce key: " << cce->KeyString()
+                            LOG(ERROR)
+                                << "Buffered cmds found on leader node"
+                                << ", cce CommitTs: " << cce->CommitTs() << "\n"
+                                << buffered_cmds;
+                            DLOG(ERROR)
+                                << "Buffered cmds detail, cce key: " << cce->KeyString();

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In include/cc/template_cc_map.h around lines 6535-6539, the error log prints
cce->KeyString() which may expose user keys; replace this with a non-PII value
or guard it: either (preferred) compute and log a one-way hash (e.g., SHA256 hex
or truncated SHA256) of cce->KeyString() so the original key cannot be
recovered, or (alternative) remove the key from LOG(ERROR) and only emit the
full key under debug/trace logging (DLOG or guarded by a debug flag). Ensure the
chosen hash uses a consistent method across code (and consider a per-instance
salt if required), update the log message to remain useful without revealing raw
keys, and keep the assert(false) behavior unchanged.

@MrGuin MrGuin merged commit 2c913f4 into main Oct 24, 2025
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants