obj_cmd_op read catalog UpsertLockHodlingTxn in debug build#154
Conversation
WalkthroughPropagates TxNumber through catalog read release APIs and call sites, adds debug-only instrumentation to record/delete transactions holding read locks, introduces extra debug/info logging in acquire paths, and prevents inserting null pointers into CcShard's cce_list_. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Tx as Transaction
participant LCH as LocalCcHandler
participant NBL as NonBlockingLock
participant Shard as CcShard
rect rgba(230,240,255,0.35)
note over Tx,Shard: Catalog read acquire (existing)
Tx->>LCH: AcquireCatalogRead(...)
LCH->>NBL: AcquireReadLock(...)
NBL-->>LCH: Lock acquired
alt Debug build
LCH->>Shard: UpsertLockHoldingTx(tx_number,...)
end
end
rect rgba(230,255,230,0.35)
note over Tx,Shard: Catalog read release (propagated tx_number)
Tx->>LCH: ReleaseCatalogRead(lock, tx_number)
LCH->>NBL: ReleaseReadLockFast(Shard, tx_number)
alt Debug build
NBL->>Shard: DeleteLockHoldingTx(tx_number,...)
end
NBL-->>LCH: Released / may unblock
LCH-->>Tx: Result
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings, 1 inconclusive)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
include/cc/catalog_cc_map.h (1)
2367-2374: Derive node group from tx_number to match Delete path; avoid stale debug entries.DeleteLockHoldingTx is called with cc_ng_id derived from tx_number ((tx_number >> 32) >> 10). Here Upsert uses node_group_id, which can differ. For symmetry and to prevent leaks in debug tracking, derive cc_ng_id from tx_number.
Apply:
#ifndef NDEBUG - shard_->UpsertLockHoldingTx(tx_number, - ng_term, - nullptr, - false, - node_group_id, - table_name_.Type()); + const NodeGroupId tx_ng = + static_cast<NodeGroupId>(((tx_number >> 32L) >> 10)); + shard_->UpsertLockHoldingTx(tx_number, + ng_term, // tx_term unused for reads + nullptr, + false, + tx_ng, + table_name_.Type()); #endifAlso confirm ng_term is the intended second parameter for reads (tx_coord_term is only used for write recovery).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
include/cc/catalog_cc_map.h(1 hunks)include/cc/local_cc_handler.h(1 hunks)include/cc/non_blocking_lock.h(1 hunks)src/cc/cc_shard.cpp(1 hunks)src/cc/local_cc_handler.cpp(1 hunks)src/cc/non_blocking_lock.cpp(1 hunks)src/tx_execution.cpp(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
include/cc/local_cc_handler.h (1)
src/cc/local_cc_handler.cpp (2)
ReleaseCatalogRead(677-681)ReleaseCatalogRead(677-677)
include/cc/non_blocking_lock.h (1)
src/cc/non_blocking_lock.cpp (2)
ReleaseReadLockFast(276-289)ReleaseReadLockFast(276-276)
src/cc/non_blocking_lock.cpp (1)
include/cc/non_blocking_lock.h (1)
read_cnt_(356-356)
include/cc/catalog_cc_map.h (3)
include/tx_operation.h (2)
table_name_(151-152)table_name_(1065-1065)include/tx_request.h (3)
table_name_(750-1066)table_name_(780-781)table_name_(800-910)include/remote/remote_cc_request.h (1)
table_name_(808-809)
src/tx_execution.cpp (1)
include/tx_execution.h (1)
TxNumber(211-228)
🔇 Additional comments (6)
src/cc/local_cc_handler.cpp (1)
677-681: LGTM: Clean tx_number propagation for catalog read lock release.The method signature change correctly adds the
tx_numberparameter and forwards it toReleaseReadLockFast, enabling debug-only tracking of which transaction is releasing the catalog read lock.src/cc/non_blocking_lock.cpp (1)
276-289: LGTM: Debug instrumentation correctly tracks catalog read lock releases.The changes properly add debug-only tracking of catalog read lock releases by transaction number:
- The signature change is minimal and correct.
- The debug code is properly guarded with
#ifndef NDEBUG, so production builds are unaffected.- The bit arithmetic
(tx_number >> 32L) >> 10correctly extracts the node group ID (consistent with the pattern used elsewhere, e.g., line 817 inlocal_cc_handler.cpp).- The
nullptrparameter is appropriate for catalog reads, which use the fast path without a regularCcEntry.src/tx_execution.cpp (1)
318-326: Scoped tx_number capture looks good.Grabbing
tx_numberonce avoids repeated atomic loads and safely propagates the context toReleaseCatalogRead. No issues spotted.src/cc/cc_shard.cpp (1)
735-738: Good fix: avoid inserting nullptr into cce_list_.Prevents invalid set entries and aligns with DeleteLockHoldingTx(cce_ptr) semantics.
include/cc/non_blocking_lock.h (1)
134-134: ReleaseReadLockFast signature usages verified
LocalCcHandler::ReleaseCatalogRead correctly passes tx_number; no missing invocations found.include/cc/local_cc_handler.h (1)
175-175: ReleaseCatalogRead callers pass tx_number All call sites now pass tx_number and the implementation correctly forwards it to ReleaseReadLockFast.
20c9f9d to
29ba1b9
Compare
29ba1b9 to
bdaf8ca
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
include/cc/local_cc_handler.h (1)
175-175: Consider adding parameter documentation.The new
tx_numberparameter lacks documentation explaining its purpose. Consider adding a comment to clarify that it identifies the transaction releasing the catalog read lock (used for debug instrumentation).Example documentation addition above the function declaration:
/// <summary> /// Releases a catalog read lock. /// </summary> /// <param name="lock">The lock to release</param> /// <param name="tx_number">Transaction number for debug tracking</param> /// <returns>True if the lock was successfully released</returns> bool ReleaseCatalogRead(NonBlockingLock *lock, TxNumber tx_number) const;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
include/cc/catalog_cc_map.h(1 hunks)include/cc/local_cc_handler.h(1 hunks)include/cc/non_blocking_lock.h(2 hunks)include/cc/object_cc_map.h(1 hunks)include/cc/template_cc_map.h(3 hunks)src/cc/cc_shard.cpp(1 hunks)src/cc/local_cc_handler.cpp(1 hunks)src/cc/non_blocking_lock.cpp(1 hunks)src/tx_execution.cpp(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- include/cc/non_blocking_lock.h
- src/cc/non_blocking_lock.cpp
- src/tx_execution.cpp
- src/cc/cc_shard.cpp
🧰 Additional context used
🧬 Code graph analysis (2)
include/cc/local_cc_handler.h (1)
src/cc/local_cc_handler.cpp (2)
ReleaseCatalogRead(677-682)ReleaseCatalogRead(677-678)
include/cc/catalog_cc_map.h (3)
include/tx_operation.h (2)
table_name_(151-152)table_name_(1065-1065)include/tx_request.h (3)
table_name_(750-1066)table_name_(780-781)table_name_(800-910)include/remote/remote_cc_request.h (1)
table_name_(808-809)
🔇 Additional comments (4)
include/cc/object_cc_map.h (1)
526-526: LGTM! Helpful debug information added.Adding the CCE key to the error log provides valuable context for debugging buffered command issues on the leader node. This complements the existing CommitTs in the log output.
src/cc/local_cc_handler.cpp (1)
677-681: LGTM! Parameter propagation is correct.The signature change correctly adds the
TxNumber tx_numberparameter and forwards it toReleaseReadLockFast. The implementation aligns with the header declaration and enables debug-only transaction tracking as intended by this PR.include/cc/template_cc_map.h (2)
719-723: Good addition: debug resume traceAdds helpful context (txn, table, key, lock, err) when resuming. Safe as DLOG.
932-935: Good addition: debug blocked traceExtra DLOG on ACQUIRE_LOCK_BLOCKED will aid diagnosis without runtime cost in release.
| #ifndef NDEBUG | ||
| shard_->UpsertLockHoldingTx(tx_number, | ||
| ng_term, | ||
| nullptr, | ||
| false, | ||
| node_group_id, | ||
| table_name_.Type()); | ||
| #endif |
There was a problem hiding this comment.
🧩 Analysis chain
Verify the table type parameter in debug instrumentation.
The call to UpsertLockHoldingTx passes table_name_.Type(), which returns TableType::Catalog for this catalog CC map. However, the method parameter table_name represents the specific table being read.
Should this be table_name.Type() instead to track the actual table type being locked, or is the intention to record that the catalog map itself was accessed? The current implementation could lead to confusion during debugging if the intent is to track locks on specific table types.
Run the following script to examine how UpsertLockHoldingTx is used elsewhere and understand its expected semantics:
Based on learnings: The PR introduces debug instrumentation to track transactions holding read locks on catalog entries. Clarification is needed on whether tracking should use the catalog's type or the actual table's type.
🏁 Script executed:
#!/bin/bash
# Description: Check UpsertLockHoldingTx usage to understand parameter semantics
# Search for the function definition to understand parameter meanings
ast-grep --pattern 'UpsertLockHoldingTx($$$)'
# Search for other call sites to see what table types are passed
rg -nP -C3 'UpsertLockHoldingTx\(' --type=cppLength of output: 5550
🏁 Script executed:
#!/bin/bash
sed -n '2300,2370p' include/cc/catalog_cc_map.h
sed -n '430,470p' include/cc/cc_shard.hLength of output: 4283
Replace table_name_.Type() with table_name.Type() in the UpsertLockHoldingTx call
This ensures debug instrumentation records the actual table’s type rather than the catalog map’s type.
File: include/cc/catalog_cc_map.h:2368
🤖 Prompt for AI Agents
In include/cc/catalog_cc_map.h around lines 2367 to 2374, the call to
shard_->UpsertLockHoldingTx currently passes table_name_.Type() which records
the catalog map’s type; replace that argument with table_name.Type() so the
actual table’s type is recorded. Update the UpsertLockHoldingTx invocation to
use table_name.Type(), ensuring the table_name variable is in scope and
correctly referenced (no other changes to surrounding logic).
| TxNumber tx_number) const; | ||
|
|
||
| bool ReleaseCatalogRead(NonBlockingLock *lock) const; | ||
| bool ReleaseCatalogRead(NonBlockingLock *lock, TxNumber tx_number) const; |
There was a problem hiding this comment.
🧩 Analysis chain
Verify all call sites are updated to pass tx_number.
The signature change is correct and consistent with the implementation. However, as this is a public API change, ensure all callers of ReleaseCatalogRead have been updated to provide the new tx_number parameter.
Run the following script to verify all call sites have been updated:
🏁 Script executed:
#!/bin/bash
# Description: Find all calls to ReleaseCatalogRead and verify they pass tx_number
# Search for function calls to ReleaseCatalogRead
rg -n -C3 '\bReleaseCatalogRead\s*\(' --type=cppLength of output: 1396
Verify correct argument types for ReleaseCatalogRead
The only call to ReleaseCatalogRead in src/tx_execution.cpp:325 passes db_idx.first (an integer index), not a NonBlockingLock*. Update this call to use the proper lock pointer (e.g., the first element of your lock container) along with tx_number.
🤖 Prompt for AI Agents
In include/cc/local_cc_handler.h around line 175, ReleaseCatalogRead currently
expects a NonBlockingLock* and a TxNumber, but the only caller passes an integer
index; update the caller in src/tx_execution.cpp (around line 325) to pass the
actual lock pointer (e.g., the first element from the lock container, not
db_idx.first) along with tx_number. Ensure you obtain the NonBlockingLock* from
your lock container (or lookup by index and take the pointer/reference) and pass
that pointer to ReleaseCatalogRead, and adjust any null checks if needed.
| << "Buffered cmds found on leader node" | ||
| << ", cce key: " << cce->KeyString() | ||
| << ", cce CommitTs: " << cce->CommitTs() << "\n" | ||
| << buffered_cmds; | ||
| assert(false); |
There was a problem hiding this comment.
Potential PII leak: logging full key in non-debug error path
LOG(ERROR) now includes cce->KeyString() which can expose user keys in prod logs. Prefer hashing/redaction or guard by debug flag.
Apply one of the following:
- Redact/hash the key:
- LOG(ERROR)
- << "Buffered cmds found on leader node"
- << ", cce key: " << cce->KeyString()
+ LOG(ERROR)
+ << "Buffered cmds found on leader node"
+ << ", cce key_hash: 0x" << std::hex << key->Hash()
<< ", cce CommitTs: " << cce->CommitTs() << "\n"
<< buffered_cmds;- Or gate the key print behind DLOG/trace-only logging:
- LOG(ERROR)
- << "Buffered cmds found on leader node"
- << ", cce key: " << cce->KeyString()
+ LOG(ERROR)
+ << "Buffered cmds found on leader node"
+ << ", cce CommitTs: " << cce->CommitTs() << "\n"
+ << buffered_cmds;
+ DLOG(ERROR)
+ << "Buffered cmds detail, cce key: " << cce->KeyString();Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In include/cc/template_cc_map.h around lines 6535-6539, the error log prints
cce->KeyString() which may expose user keys; replace this with a non-PII value
or guard it: either (preferred) compute and log a one-way hash (e.g., SHA256 hex
or truncated SHA256) of cce->KeyString() so the original key cannot be
recovered, or (alternative) remove the key from LOG(ERROR) and only emit the
full key under debug/trace logging (DLOG or guarded by a debug flag). Ensure the
chosen hash uses a consistent method across code (and consider a per-instance
salt if required), update the log message to remain useful without revealing raw
keys, and keep the assert(false) behavior unchanged.
Here are some reminders before you submit the pull request
fixes eloqdb/tx_service#issue_id./mtr --suite=mono_main,mono_multi,mono_basicSummary by CodeRabbit
Bug Fixes
Refactor
Chores