test: Phase 1 unit-test framework — MemDataStore backend + TestNode fixture#506
Conversation
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ice path MemDataStore now releases the pooled *LocalRequest it receives (PoolableGuard, mirroring RocksDBDataStoreCommon); otherwise the DataStoreService thread_local ObjectPool destructor spins forever at thread exit. The round-trip test now exercises the real DataStoreService local Read/BatchWriteRecords overloads instead of stack requests on a captured DataStore*. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Wire TxHandle::Upsert/Delete/Read over UpsertTxRequest/ReadTxRequest and make the EloqKv test table resolvable in the mock-catalog TxService: - TestNode now brings the engine up with skip_kv=true, enable_cache_replacement=false, and a prebuilt-tables map for the test table, so its schema is installed at leader start and Read/Upsert resolve a CcMap without a KV catalog fetch. store_hd is passed as nullptr to the TxService (the checkpointer early-returns and never tries to data-sync the in-memory-only table); the DSS bring-up stays as scaffolding. - MockCatalogFactory: initialize MockTableSchema::key_schema_ (InitCcm/ CreatePkCcMap dereference KeySchema()) and create the Pk CcMap as hash-partitioned (RangePartitioned=false) to match an EloqKv table, so an absent-key read takes the full-entries fast path instead of PinRangeSlice. - TestNodeSmoke-Test proves a write->commit->read==value round-trip, an absent-key read returning false, overwrite, and delete, all within a single TestNode (the Sharder singleton allows only one per process). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The last code reference was removed when StartTsCollector-Test migrated to the TestNode fixture; only a descriptive comment in mem_data_store.h remains. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughPhase 1 wires tests into CI, adds a CMake WITH_TESTS gate, implements an in-memory MemDataStore + factory, introduces the TestNode in-process fixture and TxHandle helpers, migrates and expands Catch2 tests, and applies supporting fixes for keys and fast-meta-data shard safety. ChangesPhase 1 Test Framework: CI Integration, MemDataStore Backend, TestNode Harness
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (3)
.github/workflows/unit-tests.yml (2)
15-18: ⚡ Quick winConsider hardening checkout step for supply-chain security.
Two security improvements:
- Pin
actions/checkoutto a commit SHA rather than a mutable tag (@v4) to prevent supply-chain attacks via action updates.- Add
persist-credentials: falseto prevent the GitHub token from being available to subsequent steps (defense in depth).🔒 Proposed hardening
- name: Checkout (recursive submodules) - uses: actions/checkout@v4 + uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 with: submodules: recursive + persist-credentials: false🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/unit-tests.yml around lines 15 - 18, Update the GitHub Actions checkout step to harden supply-chain security by replacing the mutable tag uses: actions/checkout@v4 with a specific commit SHA (pin to a full commit hash for actions/checkout) and add the option persist-credentials: false while preserving existing options like submodules: recursive; modify the checkout step (the "Checkout (recursive submodules)" step) to include the pinned ref and persist-credentials: false.Source: Linters/SAST tools
12-12: ⚡ Quick winPin container image to a digest or versioned tag.
The
:latesttag is mutable and can change unexpectedly, breaking build reproducibility. Pin to a specific digest (@sha256:...) or use a versioned tag (e.g.,:v1.2.3) to ensure deterministic builds.📌 Example pinning to digest
container: - image: eloqdata/eloq-dev-ci-ubuntu2404:latest + image: eloqdata/eloq-dev-ci-ubuntu2404@sha256:<digest>Obtain the digest via:
docker pull eloqdata/eloq-dev-ci-ubuntu2404:latest docker inspect --format='{{index .RepoDigests 0}}' eloqdata/eloq-dev-ci-ubuntu2404:latest🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/unit-tests.yml at line 12, Replace the mutable image reference "image: eloqdata/eloq-dev-ci-ubuntu2404:latest" with a pinned, immutable identifier: either a specific versioned tag (e.g., :vX.Y.Z) or the image digest form (`@sha256`:...), so CI uses a deterministic image; obtain the digest locally via docker pull and docker inspect (as suggested) and update the workflow line to use that digest or chosen versioned tag.Source: Linters/SAST tools
tx_service/tests/CMakeLists.txt (1)
11-23: 💤 Low valueConsider disabling compiler extensions on
test_harnessfor consistency.Line 47 sets
CXX_EXTENSIONS OFFon each test executable, buttest_harness(line 22) only setsCXX_STANDARD 20. While not a functional issue (each test explicitly disables extensions), applying the same property to the harness library improves consistency and prevents accidental reliance on compiler-specific extensions in shared test utilities.🔧 Proposed consistency fix
target_link_libraries(test_harness PUBLIC data_substrate) set_property(TARGET test_harness PROPERTY CXX_STANDARD 20) +set_property(TARGET test_harness PROPERTY CXX_EXTENSIONS OFF)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tx_service/tests/CMakeLists.txt` around lines 11 - 23, The test_harness target currently sets only CXX_STANDARD; disable compiler-specific extensions for consistency by setting the CXX_EXTENSIONS property OFF on the test_harness target (e.g., add set_property(TARGET test_harness PROPERTY CXX_EXTENSIONS OFF) alongside the existing set_property(TARGET test_harness PROPERTY CXX_STANDARD 20)), ensuring shared test utilities cannot rely on compiler extensions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tx_service/tests/harness/mem_data_store_factory.h`:
- Around line 21-29: CreateDataStore currently calls MemDataStore::Initialize()
and MemDataStore::StartDB(...) but ignores their failures; change
CreateDataStore to check the return/result of ds->Initialize() and of
ds->StartDB(term[, shard_id]) and, on failure, clean up and return a failure
indicator (e.g., nullptr or an error) instead of returning a started-but-broken
MemDataStore; ensure both code paths (the `#ifdef` branch and the else branch)
perform the same checks and propagate the failure, and add brief logging for the
failure path to aid debugging.
In `@tx_service/tests/harness/mem_data_store.cpp`:
- Around line 141-195: DeleteRange, CreateTable, and DropTable mutate store_ but
do not honor read_only_; add the same read-only guard used in BatchWriteRecords:
at the start of each method (after PoolableGuard construction) check if
read_only_ is true and if so construct a ::EloqDS::remote::CommonResult with
error_code ::EloqDS::remote::DataStoreError::READ_ONLY and immediately call
req->SetFinish(result) and return, otherwise proceed with the existing locking
and mutation logic; reference the methods DeleteRange, CreateTable, DropTable
and the PoolableGuard usage so you place the early-return check consistently in
each.
In `@tx_service/tests/harness/test_node.cpp`:
- Around line 127-134: The ReserveTxPortWindow() logic currently closes the
probe sockets (the fds vector) before returning base, which lets another process
steal base..base+3; modify ReserveTxPortWindow so that probe sockets are not
closed when reservation succeeds: only close the fds on failure (when ok is
false) and retain them until consumers have bound (or transfer ownership to the
caller to close later); specifically, remove or move the ::close(fd) loop out of
the successful-return path that returns base and ensure the fds remain open
while returning a successful reservation.
- Around line 407-411: In BeginTx, do not ignore InitTxRequest failures: after
calling init.Wait() check the InitTxRequest result/status (the response or error
field on the InitTxRequest instance) and if it indicates failure, fail fast by
propagating the error (return an error/invalid TxHandle or throw) instead of
returning a normal TxHandle; update the code around InitTxRequest init(...),
init.Reset(), txm->Execute(&init), init.Wait() so failed initialization
short-circuits and does not construct or return a successful TxHandle.
In `@tx_service/tests/MemDataStore-Test.cpp`:
- Around line 25-43: The test currently picks a free port with PickFreePort()
but calls DataStoreService::StartService() only once, which races if another
process grabs the port; wrap the StartService call in a short retry loop (e.g.,
3–5 attempts) that on bind failure (EADDRINUSE or StartService failure
indicating port in use) calls PickFreePort() again and re-invokes
DataStoreService::StartService(), with a tiny backoff between attempts, and fail
only after retries are exhausted; reference PickFreePort() and
DataStoreService::StartService() when locating the change.
In `@tx_service/tests/StartTsCollector-Test.cpp`:
- Line 62: The test uses fixed sleep(3) calls to wait for ts_base_ changes;
replace each fixed sleep in StartTsCollector-Test.cpp with a bounded polling
loop that repeatedly checks the intended condition (e.g., that ts_base_ has the
expected new value or the collector reached the expected state) and breaks when
satisfied, sleeping for a short interval (10–50ms) between checks and enforcing
a total timeout (e.g., 3s) that fails the test on expiry; update every
occurrence of sleep(3) in the test to this pattern and ensure the polling
inspects the same variable/condition the test asserts (ts_base_ or the collector
status) so waits are deterministic and bounded.
- Around line 119-122: The test main currently calls Catch::Session().run(argc,
argv) directly and skips gflags parsing; restore gflags parsing by invoking the
appropriate ParseCommandLineFlags (e.g., gflags::ParseCommandLineFlags(&argc,
&argv, true) or google::ParseCommandLineFlags) at the start of main before
calling Catch::Session().run so runtime flags are initialized for brpc and other
components (update the main function where Catch::Session().run is called).
---
Nitpick comments:
In @.github/workflows/unit-tests.yml:
- Around line 15-18: Update the GitHub Actions checkout step to harden
supply-chain security by replacing the mutable tag uses: actions/checkout@v4
with a specific commit SHA (pin to a full commit hash for actions/checkout) and
add the option persist-credentials: false while preserving existing options like
submodules: recursive; modify the checkout step (the "Checkout (recursive
submodules)" step) to include the pinned ref and persist-credentials: false.
- Line 12: Replace the mutable image reference "image:
eloqdata/eloq-dev-ci-ubuntu2404:latest" with a pinned, immutable identifier:
either a specific versioned tag (e.g., :vX.Y.Z) or the image digest form
(`@sha256`:...), so CI uses a deterministic image; obtain the digest locally via
docker pull and docker inspect (as suggested) and update the workflow line to
use that digest or chosen versioned tag.
In `@tx_service/tests/CMakeLists.txt`:
- Around line 11-23: The test_harness target currently sets only CXX_STANDARD;
disable compiler-specific extensions for consistency by setting the
CXX_EXTENSIONS property OFF on the test_harness target (e.g., add
set_property(TARGET test_harness PROPERTY CXX_EXTENSIONS OFF) alongside the
existing set_property(TARGET test_harness PROPERTY CXX_STANDARD 20)), ensuring
shared test utilities cannot rely on compiler extensions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bdeee519-854e-4d33-a851-88baffddd50a
📒 Files selected for processing (17)
.github/workflows/unit-tests.ymlCMakeLists.txttx_service/include/store/int_mem_store.htx_service/tests/CMakeLists.txttx_service/tests/CcEntry-Test.cpptx_service/tests/CcPage-Test.cpptx_service/tests/LargeObjLRU-Test.cpptx_service/tests/MemDataStore-Test.cpptx_service/tests/StartTsCollector-Test.cpptx_service/tests/TestNodeSmoke-Test.cpptx_service/tests/TxConsistency-Test.cpptx_service/tests/harness/mem_data_store.cpptx_service/tests/harness/mem_data_store.htx_service/tests/harness/mem_data_store_factory.htx_service/tests/harness/test_node.cpptx_service/tests/harness/test_node.htx_service/tests/include/mock/mock_catalog_factory.h
💤 Files with no reviewable changes (4)
- tx_service/tests/CcEntry-Test.cpp
- tx_service/include/store/int_mem_store.h
- tx_service/tests/LargeObjLRU-Test.cpp
- tx_service/tests/CcPage-Test.cpp
| for (int fd : fds) | ||
| { | ||
| ::close(fd); | ||
| } | ||
| if (ok) | ||
| { | ||
| return base; | ||
| } |
There was a problem hiding this comment.
ReserveTxPortWindow() releases the reservation before consumers bind.
The probe sockets are closed before returning base, so another parallel process can claim base..base+3 before DSS/TxService starts. This can create CI flakes in parallel runs.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tx_service/tests/harness/test_node.cpp` around lines 127 - 134, The
ReserveTxPortWindow() logic currently closes the probe sockets (the fds vector)
before returning base, which lets another process steal base..base+3; modify
ReserveTxPortWindow so that probe sockets are not closed when reservation
succeeds: only close the fds on failure (when ok is false) and retain them until
consumers have bound (or transfer ownership to the caller to close later);
specifically, remove or move the ::close(fd) loop out of the successful-return
path that returns base and ensure the fds remain open while returning a
successful reservation.
| // REQUIRE(txs[i]->GetStartTs() == 0U); | ||
| txs[i]->Execute(init_tx_reqs[i]); | ||
| sleep(3); // to use diffrent value of ts_base_ | ||
| sleep(3); // to use different value of ts_base_ |
There was a problem hiding this comment.
Replace fixed sleeps with bounded polling in this test path.
Line 62 keeps a hard sleep(3) (same pattern also appears at Lines 45, 83, and 105). That reintroduces timing-dependent behavior and conflicts with the Phase 1 acceptance criterion of no fixed sleeps.
Proposed deterministic wait pattern
+#include <chrono>
+#include <thread>
...
+ auto wait_for_collector = [&](uint64_t expected_min_ts) {
+ const auto deadline =
+ std::chrono::steady_clock::now() + std::chrono::seconds(3);
+ while (std::chrono::steady_clock::now() < deadline)
+ {
+ if (TxStartTsCollector::Instance().GlobalMinSiTxStartTs() ==
+ expected_min_ts)
+ {
+ return true;
+ }
+ std::this_thread::sleep_for(std::chrono::milliseconds(10));
+ }
+ return TxStartTsCollector::Instance().GlobalMinSiTxStartTs() ==
+ expected_min_ts;
+ };
...
- sleep(3); // Assure "TxStartTsCollector" has collected the min start ts.
+ REQUIRE(wait_for_collector(min_tx_ts));
...
- sleep(3); // Assure "TxStartTsCollector" has collected the min start ts.
+ REQUIRE(wait_for_collector(min_tx_ts));🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tx_service/tests/StartTsCollector-Test.cpp` at line 62, The test uses fixed
sleep(3) calls to wait for ts_base_ changes; replace each fixed sleep in
StartTsCollector-Test.cpp with a bounded polling loop that repeatedly checks the
intended condition (e.g., that ts_base_ has the expected new value or the
collector reached the expected state) and breaks when satisfied, sleeping for a
short interval (10–50ms) between checks and enforcing a total timeout (e.g., 3s)
that fails the test on expiry; update every occurrence of sleep(3) in the test
to this pattern and ensure the polling inspects the same variable/condition the
test asserts (ts_base_ or the collector status) so waits are deterministic and
bounded.
| int main(int argc, char **argv) | ||
| { | ||
| gflags::ParseCommandLineFlags(&argc, &argv, true); | ||
| int ret = Catch::Session().run(argc, argv); | ||
| return ret; | ||
| return Catch::Session().run(argc, argv); | ||
| } |
There was a problem hiding this comment.
Restore gflags parsing in own-main tests before running Catch2.
Line 121 now directly runs Catch2, but this target is still categorized as an own-main test specifically to parse gflags before brpc use. Dropping parsing can break runtime flag initialization and test behavior.
Suggested fix
+#include <gflags/gflags.h>
...
int main(int argc, char **argv)
{
+ gflags::ParseCommandLineFlags(&argc, &argv, true);
return Catch::Session().run(argc, argv);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| int main(int argc, char **argv) | |
| { | |
| gflags::ParseCommandLineFlags(&argc, &argv, true); | |
| int ret = Catch::Session().run(argc, argv); | |
| return ret; | |
| return Catch::Session().run(argc, argv); | |
| } | |
| `#include` <gflags/gflags.h> | |
| int main(int argc, char **argv) | |
| { | |
| gflags::ParseCommandLineFlags(&argc, &argv, true); | |
| return Catch::Session().run(argc, argv); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tx_service/tests/StartTsCollector-Test.cpp` around lines 119 - 122, The test
main currently calls Catch::Session().run(argc, argv) directly and skips gflags
parsing; restore gflags parsing by invoking the appropriate
ParseCommandLineFlags (e.g., gflags::ParseCommandLineFlags(&argc, &argv, true)
or google::ParseCommandLineFlags) at the start of main before calling
Catch::Session().run so runtime flags are initialized for brpc and other
components (update the main function where Catch::Session().run is called).
LargeObjLRU-Test/CcPage-Test built a bare CcShard and called Init() on the main thread without BindThreadToFastMetaDataShard, leaving tls_shard_idx at SIZE_MAX. FastMetaDataMutex::lock_shared then indexed mux_ptrs_[SIZE_MAX] (vector size = core_num) — a huge out-of-bounds read of a garbage atomic*. ~50% of the time the garbage had the writer-mask bit set, busy-spinning forever during RangeBucketCcMap construction (the ~40% "teardown" hang was actually a construction-time spin); the other ~50% did an out-of-bounds CAS that silently corrupted memory. Fix: bind the main thread to shard 0 before Init(), mirroring production (TxProcessor::Run does BindThreadToFastMetaDataShard(thd_id_) then Init()). Add assert(tls_shard_idx < mux_ptrs_.size()) to lock_shared/unlock_shared so the contract is enforced loudly instead of silently spinning/corrupting. TC-INV-01 now 20/20 (was ~40% hang); CcPage-Test deterministic. Removing the corruption unmasked a separate pre-existing assert in TC-FE-01 (FirstKey on an empty page in the LO-LRU FindEmplace path) — tracked separately; LargeObjLRU stays non-gating in CI until that is fixed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Two correctness fixes found while investigating store-backed TestNode (the full store-integration round-trip is deferred to e2e — the relational CompositeKey/CompositeRecord mock cannot drive EloqKv's object-only store path; that needs an ObjectCcMap-based object-record mock). - CompositeKey<Types...>::CloneTxKey() returned a bare TxKey() (null interface); any clone via TxKey::Clone() -> CloneTxKey() then deref (e.g. Hash()) segfaults. Now returns an owning clone, mirroring VoidKey::CloneTxKey and handling the +/-infinity sentinels. - MockTableSchema now owns a KVCatalogInfo (kv_table_name_ = base table name) and implements GetKVCatalogInfo/SetKVCatalogInfo instead of assert(false), so any storage-backed catalog lookup resolves the kv table name. Verified non-breaking: CcEntry/CcPage/MemDataStore/TestNodeSmoke/TxConsistency/ StartTsCollector all pass. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tx_service/tests/include/mock/mock_catalog_factory.h (1)
226-236: ⚡ Quick winEnforce the hash-partition invariant explicitly.
Line 226 documents that this factory is hash-partitioned-only, but the precondition is not checked before constructing the CCM. Add a fail-fast assert so accidental non-hash usage doesn’t silently route through the wrong map mode.
Suggested patch
CcMap::uptr CreatePkCcMap(const TableName &table_name, const TableSchema *table_schema, bool ccm_has_full_entries, CcShard *shard, txservice::NodeGroupId cc_ng_id) override { + assert(table_name.IsHashPartitioned() && + "MockCatalogFactory expects hash-partitioned tables"); uint64_t schema_ts = table_schema->KeySchema()->SchemaTs(); // RangePartitioned=false: tables created through this factory are // EloqKv-engine tables (TableName::IsHashPartitioned() == true), so the🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tx_service/tests/include/mock/mock_catalog_factory.h` around lines 226 - 236, The factory promises only hash-partitioned tables but doesn't assert that precondition, so add a fail-fast check before constructing the CCM: verify the table descriptor (or the TableName used by this factory) satisfies TableName::IsHashPartitioned() and fail (e.g., assert/LOG(FATAL)/CHECK) if not; place this check immediately above the TemplateCcMap<CompositeKey<int>, CompositeRecord<int>, true, false> construction in mock_catalog_factory.h so accidental range-partitioned usage is caught early.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tx_service/include/tx_key.h`:
- Around line 807-823: CompositeKey::Type() currently always returns
KeyType::Normal; update it to detect the sentinel singletons by checking
pointer/identity against NegativeInfinity() and PositiveInfinity() and return
the corresponding KeyType (e.g., KeyType::NegativeInfinity or
KeyType::PositiveInfinity) just like VoidKey::Type() does; keep the existing
Normal return for other instances so CloneTxKey()-created sentinel instances
report the correct enum via TxKey::Type().
---
Nitpick comments:
In `@tx_service/tests/include/mock/mock_catalog_factory.h`:
- Around line 226-236: The factory promises only hash-partitioned tables but
doesn't assert that precondition, so add a fail-fast check before constructing
the CCM: verify the table descriptor (or the TableName used by this factory)
satisfies TableName::IsHashPartitioned() and fail (e.g.,
assert/LOG(FATAL)/CHECK) if not; place this check immediately above the
TemplateCcMap<CompositeKey<int>, CompositeRecord<int>, true, false> construction
in mock_catalog_factory.h so accidental range-partitioned usage is caught early.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fd581728-fd46-4f5f-b75d-89bd6996ae06
📒 Files selected for processing (6)
.github/workflows/unit-tests.ymltx_service/include/tx_key.htx_service/src/cc/local_cc_shards.cpptx_service/tests/CcPage-Test.cpptx_service/tests/LargeObjLRU-Test.cpptx_service/tests/include/mock/mock_catalog_factory.h
🚧 Files skipped from review as they are similar to previous changes (2)
- tx_service/tests/LargeObjLRU-Test.cpp
- .github/workflows/unit-tests.yml
CI: - unit-tests.yml: run container as root (--user root) so actions/checkout can write the runner file-commands; mirrors eloqkv build.yml. Fixes the EACCES that aborted the job at checkout. - clang-format-18 the changed files to satisfy the lint. CodeRabbit: - MemDataStoreFactory: return nullptr if Initialize()/StartDB() fail. - MemDataStore: DeleteRange/CreateTable/DropTable now honor read_only_. - TestNode::BeginTx: assert InitTxRequest did not error (fail fast). - CompositeKey::Type(): recognize the +/-inf sentinels (consistent with CloneTxKey), instead of always returning Normal. - Port-window TOCTOU robustness: bring up the DSS + client first and reserve the TxService port window last (only the socket-free ctor runs before Start binds it); probe on INADDR_ANY to match how brpc binds the servers; retry DSS StartService / MemDataStore-Test service start on a lost bind. Verified 20/20 sequential + 10/10 parallel bring-ups. - StartTsCollector sleeps: documented as intentional (collector sampling interval), the exemption already noted in the Phase 1 plan. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Thanks for the review! Addressed in CI failures (root causes):
Review comments:
|
…coverage, comments) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tx_service/tests/include/mock/mock_catalog_factory.h (1)
166-174:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRebuild and clone the KV catalog state instead of mutating it in place.
Line 173 deserializes into the existing
kv_info_, butKVCatalogInfo::Deserialize()only clearskv_index_names_for an empty blob. A secondSetKVCatalogInfo()can therefore keep stale index mappings, andClone()still recreates default KV info instead of preserving the deserialized state. That makes cloned/refreshed schemas diverge from the serialized catalog they were built from.Suggested fix
TableSchema::uptr Clone() const { - return std::make_unique<MockTableSchema>( - table_name_, schema_image_, version_); + auto clone = std::make_unique<MockTableSchema>( + table_name_, schema_image_, version_); + if (kv_info_ != nullptr) + { + clone->SetKVCatalogInfo(kv_info_->Serialize()); + } + return clone; }void SetKVCatalogInfo(const std::string &kv_info_str) override { - if (kv_info_ == nullptr) - { - kv_info_ = std::make_unique<KVCatalogInfo>(); - } + kv_info_ = std::make_unique<KVCatalogInfo>(); size_t offset = 0; kv_info_->Deserialize(kv_info_str.data(), offset); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tx_service/tests/include/mock/mock_catalog_factory.h` around lines 166 - 174, SetKVCatalogInfo currently deserializes into the existing kv_info_ which can leave stale state; instead, allocate a fresh KVCatalogInfo, call KVCatalogInfo::Deserialize on that new instance (so it starts from a clean slate), then replace kv_info_ with the new instance (e.g., kv_info_ = std::move(new_info)); this ensures subsequent Clone() reads the exact deserialized state rather than inheriting leftover mappings from the prior object.
🧹 Nitpick comments (1)
tx_service/tests/MemDataStore-Test.cpp (1)
133-160: ⚡ Quick winAssert that the request completion closure actually ran.
These helpers/tests only inspect
result, but the contract you are trying to pin is thatSetFinishran synchronously. In theNO_ERRORpaths a missingSetFinishcan still read like success from a default-initializedCommonResult, so these tests can miss the regression they are supposed to catch.Suggested hardening
void Put(ServiceFixture &fx, std::string_view table, int32_t partition, std::string_view key, std::string_view value, uint64_t ts, remote::CommonResult &result) { @@ fx.service().BatchWriteRecords(table, partition, ServiceFixture::kShardId, key_parts, rec_parts, tss, ttls, ops, /*skip_wal=*/true, result, &done, /*parts_cnt_per_key=*/1, /*parts_cnt_per_record=*/1); + REQUIRE(done.ran_); } @@ int ReadKey(ServiceFixture &fx, std::string_view table, int32_t partition, std::string_view key, std::string &out_record, uint64_t &out_ts) { @@ fx.service().Read(table, partition, ServiceFixture::kShardId, key, &out_record, &out_ts, &out_ttl, &result, &done); + REQUIRE(done.ran_); return result.error_code(); } @@ void Scan(ServiceFixture &fx, std::string_view table, int32_t partition, std::string_view start_key, std::string_view end_key, @@ fx.service().ScanNext(table, partition, ServiceFixture::kShardId, start_key, end_key, inclusive_start, inclusive_end, scan_forward, batch_size, /*search_conditions=*/nullptr, &items, &session_id, /*generate_session_id=*/false, &result, &done); + REQUIRE(done.ran_); } @@ fx.service().IncreaseWriteReqCount(ServiceFixture::kShardId); store.BatchWriteRecords(&req); + REQUIRE(done.ran_);Also applies to: 165-248, 505-529
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tx_service/tests/MemDataStore-Test.cpp` around lines 133 - 160, The tests call BatchWriteRecords using a NoopClosure and then immediately inspect the CommonResult, which can miss whether the completion closure (SetFinish) actually ran; replace the NoopClosure with an explicit completion closure that sets a synchronization flag (e.g., a std::atomic<bool> or a std::promise<void>/std::future) inside its Run/Invoke method and wait for that flag/future to be signaled before asserting on result. Concretely, in the helper Put (and the other similar helpers/tests noted) replace NoopClosure done with a custom closure that captures a done_flag, calls done_flag.store(true) (or promise.set_value()) when invoked, and after BatchWriteRecords wait for the flag/future to be set (with a timeout) to ensure SetFinish ran before reading CommonResult; reference the Put function, BatchWriteRecords call, NoopClosure, and CommonResult when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tx_service/tests/harness/mem_data_store.cpp`:
- Around line 250-260: The test harness must preserve the real ScanNext
continuation contract instead of unconditionally calling req->ClearSessionId();
update MemDataStore::ScanNext so that if req->generate_session_id() or
req->has_session_id() is set it either (a) implements server-side continuation
by creating/looking up an iterator stored in scan_iter_cache_ keyed by
session_id (and sets a session_id in the response when generate_session_id() is
true) and resumes iteration from that iterator, or (b) if you choose not to
implement sessions, detect req->generate_session_id()/req->has_session_id() and
immediately return a clear error (fail fast) indicating server-side session
continuation is unsupported; do not silently drop the session_id via
req->ClearSessionId(). Ensure the changes touch the ScanNext handling around the
current req->ClearSessionId() call and update response session_id behavior
accordingly.
---
Outside diff comments:
In `@tx_service/tests/include/mock/mock_catalog_factory.h`:
- Around line 166-174: SetKVCatalogInfo currently deserializes into the existing
kv_info_ which can leave stale state; instead, allocate a fresh KVCatalogInfo,
call KVCatalogInfo::Deserialize on that new instance (so it starts from a clean
slate), then replace kv_info_ with the new instance (e.g., kv_info_ =
std::move(new_info)); this ensures subsequent Clone() reads the exact
deserialized state rather than inheriting leftover mappings from the prior
object.
---
Nitpick comments:
In `@tx_service/tests/MemDataStore-Test.cpp`:
- Around line 133-160: The tests call BatchWriteRecords using a NoopClosure and
then immediately inspect the CommonResult, which can miss whether the completion
closure (SetFinish) actually ran; replace the NoopClosure with an explicit
completion closure that sets a synchronization flag (e.g., a std::atomic<bool>
or a std::promise<void>/std::future) inside its Run/Invoke method and wait for
that flag/future to be signaled before asserting on result. Concretely, in the
helper Put (and the other similar helpers/tests noted) replace NoopClosure done
with a custom closure that captures a done_flag, calls done_flag.store(true) (or
promise.set_value()) when invoked, and after BatchWriteRecords wait for the
flag/future to be set (with a timeout) to ensure SetFinish ran before reading
CommonResult; reference the Put function, BatchWriteRecords call, NoopClosure,
and CommonResult when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 478ada08-b70b-4637-a027-7d04fae2005e
📒 Files selected for processing (11)
.github/workflows/unit-tests.ymltx_service/include/tx_key.htx_service/tests/CcPage-Test.cpptx_service/tests/MemDataStore-Test.cpptx_service/tests/StartTsCollector-Test.cpptx_service/tests/TxConsistency-Test.cpptx_service/tests/harness/mem_data_store.cpptx_service/tests/harness/mem_data_store_factory.htx_service/tests/harness/test_node.cpptx_service/tests/harness/test_node.htx_service/tests/include/mock/mock_catalog_factory.h
🚧 Files skipped from review as they are similar to previous changes (8)
- tx_service/include/tx_key.h
- .github/workflows/unit-tests.yml
- tx_service/tests/StartTsCollector-Test.cpp
- tx_service/tests/CcPage-Test.cpp
- tx_service/tests/harness/mem_data_store_factory.h
- tx_service/tests/TxConsistency-Test.cpp
- tx_service/tests/harness/test_node.cpp
- tx_service/tests/harness/test_node.h
…tract Address review feedback that ScanNext's session-less re-seek diverges from the real server-side-iterator scan contract. It does not: the production client (SinglePartitionScanner::FetchNextBatch) always advances start_key to the last returned key with inclusive_start=false, so re-seeking the ordered map from start_key yields the same batch the server-side iterator would; the session_id is only an iterator optimization. Fail-fast is not viable because the client always sets generate_session_id=true. - Clarify the ScanNext comment to state why session-less re-seek is contract- equivalent (and that a session-only resume without advancing start_key, which the client never does, would be the only divergence). - Add a paginated multi-batch scan test (batch_size=2 over 5 keys) that drives ScanNext exactly as the client does (advance start_key, generate_session_id= true, carry session_id) and asserts a complete, in-order, gap-free result. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Summary
Phase 1 of the data_substrate test framework (tracking issue #505): the Tier 0 build/CI foundation and the Tier 1 in-process single-node fixture. Brings the orphaned Catch2 tests into the build + CI, adds a real in-memory
MemDataStorebackend (replacing theassert(false)stubIntMemoryStore), and adds aTestNodefixture with synchronous transaction APIs, demonstrated by transaction-consistency tests.Design spec / plan:
eloqkvrepodocs/superpowers/specs/2026-06-13-data-substrate-test-framework-design.mdanddocs/superpowers/plans/2026-06-13-data-substrate-test-framework-phase1.md.What's included
Tier 0 — build + CI
WITH_TESTSCMake option +add_subdirectory(tx_service/tests);tx_service/tests/CMakeLists.txtrewritten as a proper subdirectory (atest_harnessstatic lib + per-file Catch2 targets, with the correctCatch2WithMainvsCatch2split)..github/workflows/unit-tests.yml: builds with-DWITH_TESTS=ONand runscteston PRs.Tier 1 — storage backend + fixture
MemDataStore(EloqDS::DataStore) +MemDataStoreFactory: a synchronous in-memory backend wired through the realDataStoreService/DataStoreServiceClientpath (not a top-level handler bypass likeIntMemoryStore). It frees pooled requests viaPoolableGuard, mirroringRocksDBDataStoreCommon.TestNodefixture: brings up a full single-nodeTxService(mock catalog, hm-less leader) and tears down cleanly;TxHandlewrapsBegin/Read/Upsert/Delete/Commit/Abortsynchronously.Tests
MemDataStore-Test— write/read/missing-key/ordered-scan round-trip through the liveDataStoreServicelocal overloads.TestNodeSmoke-Test— bring-up/teardown + write→commit→read/overwrite/delete (verified 20× clean teardown).TxConsistency-Test— abort-invisibility, commit-visibility, and a genuine OCC read-set conflict (one writer aborts on stale read-set under RepeatableRead+OccRead).StartTsCollector-Test— migrated off the hand-rolled bring-up ontoTestNode.IntMemoryStoredeleted.The 6 framework tests pass deterministically (verified twice consecutively, no residue).
Notes for reviewers
MockCatalogFactoryfix: initialized a previously-uninitializedkey_schema_(would segfault under realInitCcm) and madeCreatePkCcMaphash-partitioned to match anEloqKvtable. Other mock users (CcPage-Test) still pass.TestNoderuns cc-only (skip_kv=true) for consistency tests — appropriate since transaction consistency is a cc-layer property, faster and deterministic.MemDataStoreitself is exercised through the real service path byMemDataStore-Test. A storage-backedTestNodemode (skip_kv=false+MemDataStore, for checkpoint/eviction/recovery tests) is the natural next increment.LargeObjLRU-Testis quarantined non-gating in CI. Running it newly surfaced a pre-existing engine teardown race (not introduced here):LocalCcShards::Terminate()→WorkerThreadContext::Terminate()notifies a plaincv_, but a data-sync/flush worker can be parked on thebthread::ConditionVariablemem_cv_(flush-memory quota) and never woken →join()hangs. It reproduces ~40% on a single clean teardown under large-object memory pressure and vanishes under gdb (timing). Worth a separate engine fix.bthread_concurrencycap is set in theTestNodector before any brpc server starts (mirroringcore/src/data_substrate.cpp), required becauseELOQ_MODULE_ENABLEDasserts worker index <core_num.Closes #505 (Phase 1).
🤖 Generated with Claude Code
Summary by CodeRabbit
Tests
Chores