feat: pass shard_id to eloqstore#463
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughStartDB now requires a uint32_t shard_id parameter; the DataStore interface and EloqStoreDataStore signature were updated (default 0). Call sites were updated to pass data_shard_id. A new eloq_store_max_global_request_batch flag and init logic were added. The eloqstore submodule pointer was advanced. A term-guard was added to UpdateStandbyCkptTs in tx_service. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@store_handler/eloq_data_store_service/data_store_service.cpp`:
- Line 428: The implementations RocksDBDataStore::StartDB(int64_t term) and
RocksDBCloudDataStore::StartDB(int64_t term) have the wrong signature and must
be changed to match the base class virtual bool StartDB(int64_t term, uint32_t
shard_id); update both the method declarations and definitions to bool
RocksDBDataStore::StartDB(int64_t term, uint32_t shard_id) and bool
RocksDBCloudDataStore::StartDB(int64_t term, uint32_t shard_id), and adjust any
internal uses of the missing shard_id parameter; ensure the corresponding
factory code in rocksdb_data_store_factory.h and
rocksdb_cloud_data_store_factory.h and any callers (e.g., the call
shard_ref.data_store_->StartDB(term, data_shard_id)) compile against this
corrected signature.
In `@store_handler/eloq_data_store_service/data_store.h`:
- Line 65: The StartDB signature changed to StartDB(int64_t term, uint32_t
shard_id); update both the declaration and definition for
RocksDBDataStore::StartDB and RocksDBCloudDataStore::StartDB to match that exact
signature (include the uint32_t shard_id parameter) and keep the override
specifier; modify the corresponding .cpp implementations to accept and use the
shard_id parameter and adjust any internal calls or forwarded calls to StartDB
to pass the shard_id as well so the override contract and compilation succeed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c484af00-2d3e-4b5a-b248-7845534a8301
📒 Files selected for processing (4)
store_handler/eloq_data_store_service/data_store.hstore_handler/eloq_data_store_service/data_store_service.cppstore_handler/eloq_data_store_service/eloq_store_data_store.hstore_handler/eloq_data_store_service/eloqstore
There was a problem hiding this comment.
🧹 Nitpick comments (1)
store_handler/eloq_data_store_service/eloq_store_config.cpp (1)
809-815: Please add a config-precedence test for this new parameter.This adds a new CLI-vs-INI precedence path; a focused test would prevent silent regressions in parsing behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@store_handler/eloq_data_store_service/eloq_store_config.cpp` around lines 809 - 815, Add a focused unit test that verifies the new precedence for eloqstore_configs_.max_global_request_batch: exercise three cases—(1) CLI flag set (simulate setting FLAGS_eloq_store_max_global_request_batch and ensure CheckCommandLineFlagIsDefault returns false) with an INI value different from the flag and assert the final value equals the CLI flag; (2) CLI flag left default (simulate CheckCommandLineFlagIsDefault true) and INI provides a value and assert the final value equals config_reader.GetInteger result; and (3) neither CLI nor INI provide a value and assert the default FLAGS_eloq_store_max_global_request_batch is used—use the code paths around eloqstore_configs_.max_global_request_batch, CheckCommandLineFlagIsDefault, FLAGS_eloq_store_max_global_request_batch, and config_reader.GetInteger to locate where to wire the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@store_handler/eloq_data_store_service/eloq_store_config.cpp`:
- Around line 809-815: Add a focused unit test that verifies the new precedence
for eloqstore_configs_.max_global_request_batch: exercise three cases—(1) CLI
flag set (simulate setting FLAGS_eloq_store_max_global_request_batch and ensure
CheckCommandLineFlagIsDefault returns false) with an INI value different from
the flag and assert the final value equals the CLI flag; (2) CLI flag left
default (simulate CheckCommandLineFlagIsDefault true) and INI provides a value
and assert the final value equals config_reader.GetInteger result; and (3)
neither CLI nor INI provide a value and assert the default
FLAGS_eloq_store_max_global_request_batch is used—use the code paths around
eloqstore_configs_.max_global_request_batch, CheckCommandLineFlagIsDefault,
FLAGS_eloq_store_max_global_request_batch, and config_reader.GetInteger to
locate where to wire the test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 28fa570e-afbe-45c0-9927-72a691758b41
📒 Files selected for processing (2)
store_handler/eloq_data_store_service/eloq_store_config.cppstore_handler/eloq_data_store_service/eloqstore
✅ Files skipped from review due to trivial changes (1)
- store_handler/eloq_data_store_service/eloqstore
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@store_handler/eloq_data_store_service/eloq_store_data_store.h`:
- Around line 180-183: The StartDB override currently declares a default value
for data_shard_id which differs from the base DataStore::StartDB signature;
remove the default argument from the derived declaration so the signature is
identical (change bool StartDB(int64_t term, uint32_t data_shard_id = 0) to bool
StartDB(int64_t term, uint32_t data_shard_id)), leaving the body that calls
eloq_store_service_->Start(term, data_shard_id) unchanged; ensure any callers
rely on the base's parameter semantics rather than the removed default.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3c3a31d0-4ccf-4dd7-8c0d-937187126a45
📒 Files selected for processing (1)
store_handler/eloq_data_store_service/eloq_store_data_store.h
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tx_service/src/remote/cc_node_service.cpp (1)
1852-1895:⚠️ Potential issue | 🟠 MajorRevalidate standby term inside the async task to avoid stale checkpoint updates.
Line 1852 validates
StandbyNodeTerm()before enqueue, but the lambda at Line 1884 executes later. If standby term changes in between, this stale task can still callOnUpdateStandbyCkptTsandUpdateNodeGroupCkptTsunder outdated term context. Add an execution-time term check in the lambda.🔧 Proposed fix
- auto standby_node_term = Sharder::Instance().StandbyNodeTerm(); + const int64_t standby_node_term = Sharder::Instance().StandbyNodeTerm(); if (standby_node_term == -1 || (standby_node_term >> 32) != request->ng_term()) { @@ const uint32_t ng_id = request->node_group_id(); const int64_t ng_term = request->ng_term(); const uint64_t ckpt_ts = request->primary_succ_ckpt_ts(); + const int64_t expected_standby_term = standby_node_term; EnqueueStandbyTask( {StandbyTaskType::UpdateStandbyCkptTs, ng_id, ng_term, ckpt_ts, - [store_hd, ng_id, ng_term, ckpt_ts, has_data_store_write]() + [store_hd, + ng_id, + ng_term, + ckpt_ts, + has_data_store_write, + expected_standby_term]() { + const int64_t current_standby_term = + Sharder::Instance().StandbyNodeTerm(); + if (current_standby_term != expected_standby_term || + current_standby_term < 0 || + PrimaryTermFromStandbyTerm(current_standby_term) != ng_term) + { + DLOG(INFO) << "Skip stale UpdateStandbyCkptTs task at execution, " + << "ng_id=" << ng_id + << ", expected_standby_term=" << expected_standby_term + << ", current_standby_term=" << current_standby_term + << ", ng_term=" << ng_term + << ", ckpt_ts=" << ckpt_ts; + return; + } const bool ok = store_hd == nullptr ? true : store_hd->OnUpdateStandbyCkptTs( ng_id, ng_term, ckpt_ts, !has_data_store_write); if (ok) { Sharder::Instance().UpdateNodeGroupCkptTs(ng_id, ckpt_ts); } }});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tx_service/src/remote/cc_node_service.cpp` around lines 1852 - 1895, The async task enqueued by EnqueueStandbyTask can run after StandbyNodeTerm() has changed, so revalidate the term inside the lambda and abort the task if it no longer matches ng_term: inside the lambda capture ng_term (already captured) and call Sharder::Instance().StandbyNodeTerm(), then check the same condition used before (term == -1 || (term >> 32) != ng_term) and return early if it fails; only call store_hd->OnUpdateStandbyCkptTs and Sharder::Instance().UpdateNodeGroupCkptTs when the in-task term check passes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@tx_service/src/remote/cc_node_service.cpp`:
- Around line 1852-1895: The async task enqueued by EnqueueStandbyTask can run
after StandbyNodeTerm() has changed, so revalidate the term inside the lambda
and abort the task if it no longer matches ng_term: inside the lambda capture
ng_term (already captured) and call Sharder::Instance().StandbyNodeTerm(), then
check the same condition used before (term == -1 || (term >> 32) != ng_term) and
return early if it fails; only call store_hd->OnUpdateStandbyCkptTs and
Sharder::Instance().UpdateNodeGroupCkptTs when the in-task term check passes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ec533c31-602f-4ac4-9d55-a244a617ed0a
📒 Files selected for processing (1)
tx_service/src/remote/cc_node_service.cpp
7f2cce5 to
eb02ad5
Compare
Here are some reminders before you submit the pull request
fixes eloqdb/tx_service#issue_id./mtr --suite=mono_main,mono_multi,mono_basicSummary by CodeRabbit
Refactor
New Features
Bug Fixes
Chores