Skip to content

test: Phase 1 unit-test framework — MemDataStore backend + TestNode fixture#506

Merged
liunyl merged 16 commits into
mainfrom
feat/test-framework-phase1
Jun 13, 2026
Merged

test: Phase 1 unit-test framework — MemDataStore backend + TestNode fixture#506
liunyl merged 16 commits into
mainfrom
feat/test-framework-phase1

Conversation

@liunyl

@liunyl liunyl commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Summary

Phase 1 of the data_substrate test framework (tracking issue #505): the Tier 0 build/CI foundation and the Tier 1 in-process single-node fixture. Brings the orphaned Catch2 tests into the build + CI, adds a real in-memory MemDataStore backend (replacing the assert(false) stub IntMemoryStore), and adds a TestNode fixture with synchronous transaction APIs, demonstrated by transaction-consistency tests.

Design spec / plan: eloqkv repo docs/superpowers/specs/2026-06-13-data-substrate-test-framework-design.md and docs/superpowers/plans/2026-06-13-data-substrate-test-framework-phase1.md.

What's included

Tier 0 — build + CI

  • WITH_TESTS CMake option + add_subdirectory(tx_service/tests); tx_service/tests/CMakeLists.txt rewritten as a proper subdirectory (a test_harness static lib + per-file Catch2 targets, with the correct Catch2WithMain vs Catch2 split).
  • .github/workflows/unit-tests.yml: builds with -DWITH_TESTS=ON and runs ctest on PRs.

Tier 1 — storage backend + fixture

  • MemDataStore (EloqDS::DataStore) + MemDataStoreFactory: a synchronous in-memory backend wired through the real DataStoreService/DataStoreServiceClient path (not a top-level handler bypass like IntMemoryStore). It frees pooled requests via PoolableGuard, mirroring RocksDBDataStoreCommon.
  • TestNode fixture: brings up a full single-node TxService (mock catalog, hm-less leader) and tears down cleanly; TxHandle wraps Begin/Read/Upsert/Delete/Commit/Abort synchronously.

Tests

  • MemDataStore-Test — write/read/missing-key/ordered-scan round-trip through the live DataStoreService local overloads.
  • TestNodeSmoke-Test — bring-up/teardown + write→commit→read/overwrite/delete (verified 20× clean teardown).
  • TxConsistency-Test — abort-invisibility, commit-visibility, and a genuine OCC read-set conflict (one writer aborts on stale read-set under RepeatableRead+OccRead).
  • StartTsCollector-Test — migrated off the hand-rolled bring-up onto TestNode.
  • IntMemoryStore deleted.

The 6 framework tests pass deterministically (verified twice consecutively, no residue).

Notes for reviewers

  • MockCatalogFactory fix: initialized a previously-uninitialized key_schema_ (would segfault under real InitCcm) and made CreatePkCcMap hash-partitioned to match an EloqKv table. Other mock users (CcPage-Test) still pass.
  • TestNode runs cc-only (skip_kv=true) for consistency tests — appropriate since transaction consistency is a cc-layer property, faster and deterministic. MemDataStore itself is exercised through the real service path by MemDataStore-Test. A storage-backed TestNode mode (skip_kv=false + MemDataStore, for checkpoint/eviction/recovery tests) is the natural next increment.
  • LargeObjLRU-Test is quarantined non-gating in CI. Running it newly surfaced a pre-existing engine teardown race (not introduced here): LocalCcShards::Terminate()WorkerThreadContext::Terminate() notifies a plain cv_, but a data-sync/flush worker can be parked on the bthread::ConditionVariable mem_cv_ (flush-memory quota) and never woken → join() hangs. It reproduces ~40% on a single clean teardown under large-object memory pressure and vanishes under gdb (timing). Worth a separate engine fix.
  • A bthread_concurrency cap is set in the TestNode ctor before any brpc server starts (mirroring core/src/data_substrate.cpp), required because ELOQ_MODULE_ENABLED asserts worker index < core_num.

Closes #505 (Phase 1).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Tests

    • CI now runs unit tests on PRs and main in a two-phase workflow (gated run + non-blocking run for a long test).
    • Added an opt-in build flag to enable tests and integrated many new/updated Catch2 suites exercising transactions, datastore behavior, scans, and consistency.
    • Introduced an in-process test harness and an in-memory test datastore for reliable local testing.
  • Chores

    • Refactored test build/configuration for cleaner integration and maintainability.

liunyl and others added 11 commits June 13, 2026 05:15
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ice path

MemDataStore now releases the pooled *LocalRequest it receives (PoolableGuard,
mirroring RocksDBDataStoreCommon); otherwise the DataStoreService thread_local
ObjectPool destructor spins forever at thread exit. The round-trip test now
exercises the real DataStoreService local Read/BatchWriteRecords overloads
instead of stack requests on a captured DataStore*.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Wire TxHandle::Upsert/Delete/Read over UpsertTxRequest/ReadTxRequest and
make the EloqKv test table resolvable in the mock-catalog TxService:

- TestNode now brings the engine up with skip_kv=true,
  enable_cache_replacement=false, and a prebuilt-tables map for the test
  table, so its schema is installed at leader start and Read/Upsert resolve
  a CcMap without a KV catalog fetch. store_hd is passed as nullptr to the
  TxService (the checkpointer early-returns and never tries to data-sync the
  in-memory-only table); the DSS bring-up stays as scaffolding.
- MockCatalogFactory: initialize MockTableSchema::key_schema_ (InitCcm/
  CreatePkCcMap dereference KeySchema()) and create the Pk CcMap as
  hash-partitioned (RangePartitioned=false) to match an EloqKv table, so an
  absent-key read takes the full-entries fast path instead of PinRangeSlice.
- TestNodeSmoke-Test proves a write->commit->read==value round-trip, an
  absent-key read returning false, overwrite, and delete, all within a single
  TestNode (the Sharder singleton allows only one per process).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The last code reference was removed when StartTsCollector-Test migrated to
the TestNode fixture; only a descriptive comment in mem_data_store.h remains.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 13, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b3e20c1f-e399-495b-b0ee-6de5e863357b

📥 Commits

Reviewing files that changed from the base of the PR and between c5fce28 and bc99820.

📒 Files selected for processing (2)
  • tx_service/tests/MemDataStore-Test.cpp
  • tx_service/tests/harness/mem_data_store.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
  • tx_service/tests/MemDataStore-Test.cpp
  • tx_service/tests/harness/mem_data_store.cpp

Walkthrough

Phase 1 wires tests into CI, adds a CMake WITH_TESTS gate, implements an in-memory MemDataStore + factory, introduces the TestNode in-process fixture and TxHandle helpers, migrates and expands Catch2 tests, and applies supporting fixes for keys and fast-meta-data shard safety.

Changes

Phase 1 Test Framework: CI Integration, MemDataStore Backend, TestNode Harness

Layer / File(s) Summary
CI Workflow and CMake Integration
.github/workflows/unit-tests.yml, CMakeLists.txt
GitHub Actions workflow runs on PR/push with gating and stress-test phases; root CMakeLists adds WITH_TESTS option and conditionally enables tx_service/tests subdirectory.
Test CMake Refactoring and Harness
tx_service/tests/CMakeLists.txt
Refactored to integrated build: adds test_harness library, Catch2 integration, two test categories (Catch-with-main vs own-main), and add_substrate_test() helper that wires includes, links, C++20, and test discovery.
MemDataStore Backend Header
tx_service/tests/harness/mem_data_store.h
Declares EloqDS::MemDataStore in-memory DataStore test backend implementing the full DataStore API with mutex-protected nested-map storage and read-only gating.
MemDataStore Backend Implementation
tx_service/tests/harness/mem_data_store.cpp
Implements lifecycle, Read, BatchWriteRecords, FlushData, DeleteRange, Create/Drop table, ScanNext/ScanClose (forward/backward, bounds, batch), snapshot no-op, and read/write mode toggles.
MemDataStoreFactory
tx_service/tests/harness/mem_data_store_factory.h
Factory constructs/initializes MemDataStore, conditionally calls StartDB with compile-time overload, and returns neutral/default storage/S3/AWS values with IsCloudMode() = false.
TestNode Harness Declaration
tx_service/tests/harness/test_node.h
Adds TestNodeOptions, TxHandle, and helpers Key() / Rec(); declares TestNode RAII fixture with BeginTx overloads and accessors.
TestNode Harness Implementation
tx_service/tests/harness/test_node.cpp
Implements socket port reservation, temp-dir creation, in-process DataStoreService startup with MemDataStore, local client hookup, prebuilt table registration, TxService start in hm-less mode, teardown, and TxHandle ops.
TxHandle transaction ops
tx_service/tests/harness/test_node.cpp
Implements TxHandle::Commit(), Abort(), Upsert(), Delete(), and Read() by driving request objects and mapping results to booleans/outputs.
Test migrations & new tests
tx_service/tests/*
Removes custom mains where appropriate to use Catch2 main, binds test threads to fast-meta-data shard 0 where required, migrates StartTsCollector to TestNode, and adds/expands MemDataStore-Test, TestNodeSmoke-Test, and TxConsistency-Test suites.
Mock catalog & key/CC fixes
tx_service/tests/include/mock/mock_catalog_factory.h, tx_service/include/tx_key.h, tx_service/src/cc/local_cc_shards.cpp
Initializes MockTableSchema KV info/version storage and SetKVCatalogInfo; CompositeKey CloneTxKey/Type handle ±∞ sentinels correctly; FastMetaDataMutex asserts caller is bound to a valid fast-meta-data shard before indexing.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • eloqdata/tx_service#426: Related work on LargeObjLRU policy and test wiring; overlaps with LargeObjLRU-Test changes.

Suggested reviewers

  • lokax
  • lzxddz

Poem

🐰 I hopped through code to stitch the test,

A mem-store burrow where records rest,
TestNode hums, ports safely kept,
Catch runs, no sleeps — the CI leapt,
A little rabbit cheers: hop, test, jest!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.92% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the PR's primary objectives: implementing Phase 1 of the test framework with MemDataStore backend and TestNode fixture.
Description check ✅ Passed The PR description comprehensively covers Phase 1 implementation (Tier 0 build/CI and Tier 1 storage/fixture), lists all included components and tests, and addresses design decisions and known issues.
Linked Issues check ✅ Passed The PR fulfills Phase 1 requirements from #505: CMake WITH_TESTS option, CI workflow (unit-tests.yml), MemDataStore implementation, TestNode fixture with synchronous APIs, transaction-consistency tests, and IntMemoryStore removal all present and complete.
Out of Scope Changes check ✅ Passed All changes are scoped to Phase 1: build/CI setup, test harness implementation, MemDataStore/MemDataStoreFactory, TestNode fixture, test migrations, and necessary mock/utility updates. No extraneous changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/test-framework-phase1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (3)
.github/workflows/unit-tests.yml (2)

15-18: ⚡ Quick win

Consider hardening checkout step for supply-chain security.

Two security improvements:

  • Pin actions/checkout to a commit SHA rather than a mutable tag (@v4) to prevent supply-chain attacks via action updates.
  • Add persist-credentials: false to prevent the GitHub token from being available to subsequent steps (defense in depth).
🔒 Proposed hardening
     - name: Checkout (recursive submodules)
-      uses: actions/checkout@v4
+      uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
       with:
         submodules: recursive
+        persist-credentials: false
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/unit-tests.yml around lines 15 - 18, Update the GitHub
Actions checkout step to harden supply-chain security by replacing the mutable
tag uses: actions/checkout@v4 with a specific commit SHA (pin to a full commit
hash for actions/checkout) and add the option persist-credentials: false while
preserving existing options like submodules: recursive; modify the checkout step
(the "Checkout (recursive submodules)" step) to include the pinned ref and
persist-credentials: false.

Source: Linters/SAST tools


12-12: ⚡ Quick win

Pin container image to a digest or versioned tag.

The :latest tag is mutable and can change unexpectedly, breaking build reproducibility. Pin to a specific digest (@sha256:...) or use a versioned tag (e.g., :v1.2.3) to ensure deterministic builds.

📌 Example pinning to digest
   container:
-    image: eloqdata/eloq-dev-ci-ubuntu2404:latest
+    image: eloqdata/eloq-dev-ci-ubuntu2404@sha256:<digest>

Obtain the digest via:

docker pull eloqdata/eloq-dev-ci-ubuntu2404:latest
docker inspect --format='{{index .RepoDigests 0}}' eloqdata/eloq-dev-ci-ubuntu2404:latest
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/unit-tests.yml at line 12, Replace the mutable image
reference "image: eloqdata/eloq-dev-ci-ubuntu2404:latest" with a pinned,
immutable identifier: either a specific versioned tag (e.g., :vX.Y.Z) or the
image digest form (`@sha256`:...), so CI uses a deterministic image; obtain the
digest locally via docker pull and docker inspect (as suggested) and update the
workflow line to use that digest or chosen versioned tag.

Source: Linters/SAST tools

tx_service/tests/CMakeLists.txt (1)

11-23: 💤 Low value

Consider disabling compiler extensions on test_harness for consistency.

Line 47 sets CXX_EXTENSIONS OFF on each test executable, but test_harness (line 22) only sets CXX_STANDARD 20. While not a functional issue (each test explicitly disables extensions), applying the same property to the harness library improves consistency and prevents accidental reliance on compiler-specific extensions in shared test utilities.

🔧 Proposed consistency fix
 target_link_libraries(test_harness PUBLIC data_substrate)
 set_property(TARGET test_harness PROPERTY CXX_STANDARD 20)
+set_property(TARGET test_harness PROPERTY CXX_EXTENSIONS OFF)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tx_service/tests/CMakeLists.txt` around lines 11 - 23, The test_harness
target currently sets only CXX_STANDARD; disable compiler-specific extensions
for consistency by setting the CXX_EXTENSIONS property OFF on the test_harness
target (e.g., add set_property(TARGET test_harness PROPERTY CXX_EXTENSIONS OFF)
alongside the existing set_property(TARGET test_harness PROPERTY CXX_STANDARD
20)), ensuring shared test utilities cannot rely on compiler extensions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tx_service/tests/harness/mem_data_store_factory.h`:
- Around line 21-29: CreateDataStore currently calls MemDataStore::Initialize()
and MemDataStore::StartDB(...) but ignores their failures; change
CreateDataStore to check the return/result of ds->Initialize() and of
ds->StartDB(term[, shard_id]) and, on failure, clean up and return a failure
indicator (e.g., nullptr or an error) instead of returning a started-but-broken
MemDataStore; ensure both code paths (the `#ifdef` branch and the else branch)
perform the same checks and propagate the failure, and add brief logging for the
failure path to aid debugging.

In `@tx_service/tests/harness/mem_data_store.cpp`:
- Around line 141-195: DeleteRange, CreateTable, and DropTable mutate store_ but
do not honor read_only_; add the same read-only guard used in BatchWriteRecords:
at the start of each method (after PoolableGuard construction) check if
read_only_ is true and if so construct a ::EloqDS::remote::CommonResult with
error_code ::EloqDS::remote::DataStoreError::READ_ONLY and immediately call
req->SetFinish(result) and return, otherwise proceed with the existing locking
and mutation logic; reference the methods DeleteRange, CreateTable, DropTable
and the PoolableGuard usage so you place the early-return check consistently in
each.

In `@tx_service/tests/harness/test_node.cpp`:
- Around line 127-134: The ReserveTxPortWindow() logic currently closes the
probe sockets (the fds vector) before returning base, which lets another process
steal base..base+3; modify ReserveTxPortWindow so that probe sockets are not
closed when reservation succeeds: only close the fds on failure (when ok is
false) and retain them until consumers have bound (or transfer ownership to the
caller to close later); specifically, remove or move the ::close(fd) loop out of
the successful-return path that returns base and ensure the fds remain open
while returning a successful reservation.
- Around line 407-411: In BeginTx, do not ignore InitTxRequest failures: after
calling init.Wait() check the InitTxRequest result/status (the response or error
field on the InitTxRequest instance) and if it indicates failure, fail fast by
propagating the error (return an error/invalid TxHandle or throw) instead of
returning a normal TxHandle; update the code around InitTxRequest init(...),
init.Reset(), txm->Execute(&init), init.Wait() so failed initialization
short-circuits and does not construct or return a successful TxHandle.

In `@tx_service/tests/MemDataStore-Test.cpp`:
- Around line 25-43: The test currently picks a free port with PickFreePort()
but calls DataStoreService::StartService() only once, which races if another
process grabs the port; wrap the StartService call in a short retry loop (e.g.,
3–5 attempts) that on bind failure (EADDRINUSE or StartService failure
indicating port in use) calls PickFreePort() again and re-invokes
DataStoreService::StartService(), with a tiny backoff between attempts, and fail
only after retries are exhausted; reference PickFreePort() and
DataStoreService::StartService() when locating the change.

In `@tx_service/tests/StartTsCollector-Test.cpp`:
- Line 62: The test uses fixed sleep(3) calls to wait for ts_base_ changes;
replace each fixed sleep in StartTsCollector-Test.cpp with a bounded polling
loop that repeatedly checks the intended condition (e.g., that ts_base_ has the
expected new value or the collector reached the expected state) and breaks when
satisfied, sleeping for a short interval (10–50ms) between checks and enforcing
a total timeout (e.g., 3s) that fails the test on expiry; update every
occurrence of sleep(3) in the test to this pattern and ensure the polling
inspects the same variable/condition the test asserts (ts_base_ or the collector
status) so waits are deterministic and bounded.
- Around line 119-122: The test main currently calls Catch::Session().run(argc,
argv) directly and skips gflags parsing; restore gflags parsing by invoking the
appropriate ParseCommandLineFlags (e.g., gflags::ParseCommandLineFlags(&argc,
&argv, true) or google::ParseCommandLineFlags) at the start of main before
calling Catch::Session().run so runtime flags are initialized for brpc and other
components (update the main function where Catch::Session().run is called).

---

Nitpick comments:
In @.github/workflows/unit-tests.yml:
- Around line 15-18: Update the GitHub Actions checkout step to harden
supply-chain security by replacing the mutable tag uses: actions/checkout@v4
with a specific commit SHA (pin to a full commit hash for actions/checkout) and
add the option persist-credentials: false while preserving existing options like
submodules: recursive; modify the checkout step (the "Checkout (recursive
submodules)" step) to include the pinned ref and persist-credentials: false.
- Line 12: Replace the mutable image reference "image:
eloqdata/eloq-dev-ci-ubuntu2404:latest" with a pinned, immutable identifier:
either a specific versioned tag (e.g., :vX.Y.Z) or the image digest form
(`@sha256`:...), so CI uses a deterministic image; obtain the digest locally via
docker pull and docker inspect (as suggested) and update the workflow line to
use that digest or chosen versioned tag.

In `@tx_service/tests/CMakeLists.txt`:
- Around line 11-23: The test_harness target currently sets only CXX_STANDARD;
disable compiler-specific extensions for consistency by setting the
CXX_EXTENSIONS property OFF on the test_harness target (e.g., add
set_property(TARGET test_harness PROPERTY CXX_EXTENSIONS OFF) alongside the
existing set_property(TARGET test_harness PROPERTY CXX_STANDARD 20)), ensuring
shared test utilities cannot rely on compiler extensions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bdeee519-854e-4d33-a851-88baffddd50a

📥 Commits

Reviewing files that changed from the base of the PR and between ca14c3c and 24b94a3.

📒 Files selected for processing (17)
  • .github/workflows/unit-tests.yml
  • CMakeLists.txt
  • tx_service/include/store/int_mem_store.h
  • tx_service/tests/CMakeLists.txt
  • tx_service/tests/CcEntry-Test.cpp
  • tx_service/tests/CcPage-Test.cpp
  • tx_service/tests/LargeObjLRU-Test.cpp
  • tx_service/tests/MemDataStore-Test.cpp
  • tx_service/tests/StartTsCollector-Test.cpp
  • tx_service/tests/TestNodeSmoke-Test.cpp
  • tx_service/tests/TxConsistency-Test.cpp
  • tx_service/tests/harness/mem_data_store.cpp
  • tx_service/tests/harness/mem_data_store.h
  • tx_service/tests/harness/mem_data_store_factory.h
  • tx_service/tests/harness/test_node.cpp
  • tx_service/tests/harness/test_node.h
  • tx_service/tests/include/mock/mock_catalog_factory.h
💤 Files with no reviewable changes (4)
  • tx_service/tests/CcEntry-Test.cpp
  • tx_service/include/store/int_mem_store.h
  • tx_service/tests/LargeObjLRU-Test.cpp
  • tx_service/tests/CcPage-Test.cpp

Comment thread tx_service/tests/harness/mem_data_store_factory.h
Comment thread tx_service/tests/harness/mem_data_store.cpp
Comment on lines +127 to +134
for (int fd : fds)
{
::close(fd);
}
if (ok)
{
return base;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

ReserveTxPortWindow() releases the reservation before consumers bind.

The probe sockets are closed before returning base, so another parallel process can claim base..base+3 before DSS/TxService starts. This can create CI flakes in parallel runs.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tx_service/tests/harness/test_node.cpp` around lines 127 - 134, The
ReserveTxPortWindow() logic currently closes the probe sockets (the fds vector)
before returning base, which lets another process steal base..base+3; modify
ReserveTxPortWindow so that probe sockets are not closed when reservation
succeeds: only close the fds on failure (when ok is false) and retain them until
consumers have bound (or transfer ownership to the caller to close later);
specifically, remove or move the ::close(fd) loop out of the successful-return
path that returns base and ensure the fds remain open while returning a
successful reservation.

Comment thread tx_service/tests/harness/test_node.cpp
Comment thread tx_service/tests/MemDataStore-Test.cpp
// REQUIRE(txs[i]->GetStartTs() == 0U);
txs[i]->Execute(init_tx_reqs[i]);
sleep(3); // to use diffrent value of ts_base_
sleep(3); // to use different value of ts_base_

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Replace fixed sleeps with bounded polling in this test path.

Line 62 keeps a hard sleep(3) (same pattern also appears at Lines 45, 83, and 105). That reintroduces timing-dependent behavior and conflicts with the Phase 1 acceptance criterion of no fixed sleeps.

Proposed deterministic wait pattern
+#include <chrono>
+#include <thread>
...
+    auto wait_for_collector = [&](uint64_t expected_min_ts) {
+        const auto deadline =
+            std::chrono::steady_clock::now() + std::chrono::seconds(3);
+        while (std::chrono::steady_clock::now() < deadline)
+        {
+            if (TxStartTsCollector::Instance().GlobalMinSiTxStartTs() ==
+                expected_min_ts)
+            {
+                return true;
+            }
+            std::this_thread::sleep_for(std::chrono::milliseconds(10));
+        }
+        return TxStartTsCollector::Instance().GlobalMinSiTxStartTs() ==
+               expected_min_ts;
+    };
...
-    sleep(3);  // Assure "TxStartTsCollector" has collected the min start ts.
+    REQUIRE(wait_for_collector(min_tx_ts));
...
-    sleep(3);  // Assure "TxStartTsCollector" has collected the min start ts.
+    REQUIRE(wait_for_collector(min_tx_ts));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tx_service/tests/StartTsCollector-Test.cpp` at line 62, The test uses fixed
sleep(3) calls to wait for ts_base_ changes; replace each fixed sleep in
StartTsCollector-Test.cpp with a bounded polling loop that repeatedly checks the
intended condition (e.g., that ts_base_ has the expected new value or the
collector reached the expected state) and breaks when satisfied, sleeping for a
short interval (10–50ms) between checks and enforcing a total timeout (e.g., 3s)
that fails the test on expiry; update every occurrence of sleep(3) in the test
to this pattern and ensure the polling inspects the same variable/condition the
test asserts (ts_base_ or the collector status) so waits are deterministic and
bounded.

Comment on lines 119 to 122
int main(int argc, char **argv)
{
gflags::ParseCommandLineFlags(&argc, &argv, true);
int ret = Catch::Session().run(argc, argv);
return ret;
return Catch::Session().run(argc, argv);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restore gflags parsing in own-main tests before running Catch2.

Line 121 now directly runs Catch2, but this target is still categorized as an own-main test specifically to parse gflags before brpc use. Dropping parsing can break runtime flag initialization and test behavior.

Suggested fix
+#include <gflags/gflags.h>
...
int main(int argc, char **argv)
{
+    gflags::ParseCommandLineFlags(&argc, &argv, true);
     return Catch::Session().run(argc, argv);
}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
int main(int argc, char **argv)
{
gflags::ParseCommandLineFlags(&argc, &argv, true);
int ret = Catch::Session().run(argc, argv);
return ret;
return Catch::Session().run(argc, argv);
}
`#include` <gflags/gflags.h>
int main(int argc, char **argv)
{
gflags::ParseCommandLineFlags(&argc, &argv, true);
return Catch::Session().run(argc, argv);
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tx_service/tests/StartTsCollector-Test.cpp` around lines 119 - 122, The test
main currently calls Catch::Session().run(argc, argv) directly and skips gflags
parsing; restore gflags parsing by invoking the appropriate
ParseCommandLineFlags (e.g., gflags::ParseCommandLineFlags(&argc, &argv, true)
or google::ParseCommandLineFlags) at the start of main before calling
Catch::Session().run so runtime flags are initialized for brpc and other
components (update the main function where Catch::Session().run is called).

liunyl and others added 2 commits June 13, 2026 08:35
LargeObjLRU-Test/CcPage-Test built a bare CcShard and called Init() on the
main thread without BindThreadToFastMetaDataShard, leaving tls_shard_idx at
SIZE_MAX. FastMetaDataMutex::lock_shared then indexed mux_ptrs_[SIZE_MAX]
(vector size = core_num) — a huge out-of-bounds read of a garbage atomic*.
~50% of the time the garbage had the writer-mask bit set, busy-spinning
forever during RangeBucketCcMap construction (the ~40% "teardown" hang was
actually a construction-time spin); the other ~50% did an out-of-bounds CAS
that silently corrupted memory.

Fix: bind the main thread to shard 0 before Init(), mirroring production
(TxProcessor::Run does BindThreadToFastMetaDataShard(thd_id_) then Init()).
Add assert(tls_shard_idx < mux_ptrs_.size()) to lock_shared/unlock_shared so
the contract is enforced loudly instead of silently spinning/corrupting.

TC-INV-01 now 20/20 (was ~40% hang); CcPage-Test deterministic. Removing the
corruption unmasked a separate pre-existing assert in TC-FE-01 (FirstKey on an
empty page in the LO-LRU FindEmplace path) — tracked separately; LargeObjLRU
stays non-gating in CI until that is fixed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Two correctness fixes found while investigating store-backed TestNode (the
full store-integration round-trip is deferred to e2e — the relational
CompositeKey/CompositeRecord mock cannot drive EloqKv's object-only store
path; that needs an ObjectCcMap-based object-record mock).

- CompositeKey<Types...>::CloneTxKey() returned a bare TxKey() (null
  interface); any clone via TxKey::Clone() -> CloneTxKey() then deref (e.g.
  Hash()) segfaults. Now returns an owning clone, mirroring VoidKey::CloneTxKey
  and handling the +/-infinity sentinels.
- MockTableSchema now owns a KVCatalogInfo (kv_table_name_ = base table name)
  and implements GetKVCatalogInfo/SetKVCatalogInfo instead of assert(false),
  so any storage-backed catalog lookup resolves the kv table name.

Verified non-breaking: CcEntry/CcPage/MemDataStore/TestNodeSmoke/TxConsistency/
StartTsCollector all pass.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tx_service/tests/include/mock/mock_catalog_factory.h (1)

226-236: ⚡ Quick win

Enforce the hash-partition invariant explicitly.

Line 226 documents that this factory is hash-partitioned-only, but the precondition is not checked before constructing the CCM. Add a fail-fast assert so accidental non-hash usage doesn’t silently route through the wrong map mode.

Suggested patch
     CcMap::uptr CreatePkCcMap(const TableName &table_name,
                               const TableSchema *table_schema,
                               bool ccm_has_full_entries,
                               CcShard *shard,
                               txservice::NodeGroupId cc_ng_id) override
     {
+        assert(table_name.IsHashPartitioned() &&
+               "MockCatalogFactory expects hash-partitioned tables");
         uint64_t schema_ts = table_schema->KeySchema()->SchemaTs();
         // RangePartitioned=false: tables created through this factory are
         // EloqKv-engine tables (TableName::IsHashPartitioned() == true), so the
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tx_service/tests/include/mock/mock_catalog_factory.h` around lines 226 - 236,
The factory promises only hash-partitioned tables but doesn't assert that
precondition, so add a fail-fast check before constructing the CCM: verify the
table descriptor (or the TableName used by this factory) satisfies
TableName::IsHashPartitioned() and fail (e.g., assert/LOG(FATAL)/CHECK) if not;
place this check immediately above the TemplateCcMap<CompositeKey<int>,
CompositeRecord<int>, true, false> construction in mock_catalog_factory.h so
accidental range-partitioned usage is caught early.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tx_service/include/tx_key.h`:
- Around line 807-823: CompositeKey::Type() currently always returns
KeyType::Normal; update it to detect the sentinel singletons by checking
pointer/identity against NegativeInfinity() and PositiveInfinity() and return
the corresponding KeyType (e.g., KeyType::NegativeInfinity or
KeyType::PositiveInfinity) just like VoidKey::Type() does; keep the existing
Normal return for other instances so CloneTxKey()-created sentinel instances
report the correct enum via TxKey::Type().

---

Nitpick comments:
In `@tx_service/tests/include/mock/mock_catalog_factory.h`:
- Around line 226-236: The factory promises only hash-partitioned tables but
doesn't assert that precondition, so add a fail-fast check before constructing
the CCM: verify the table descriptor (or the TableName used by this factory)
satisfies TableName::IsHashPartitioned() and fail (e.g.,
assert/LOG(FATAL)/CHECK) if not; place this check immediately above the
TemplateCcMap<CompositeKey<int>, CompositeRecord<int>, true, false> construction
in mock_catalog_factory.h so accidental range-partitioned usage is caught early.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fd581728-fd46-4f5f-b75d-89bd6996ae06

📥 Commits

Reviewing files that changed from the base of the PR and between 24b94a3 and 45a0e7e.

📒 Files selected for processing (6)
  • .github/workflows/unit-tests.yml
  • tx_service/include/tx_key.h
  • tx_service/src/cc/local_cc_shards.cpp
  • tx_service/tests/CcPage-Test.cpp
  • tx_service/tests/LargeObjLRU-Test.cpp
  • tx_service/tests/include/mock/mock_catalog_factory.h
🚧 Files skipped from review as they are similar to previous changes (2)
  • tx_service/tests/LargeObjLRU-Test.cpp
  • .github/workflows/unit-tests.yml

Comment thread tx_service/include/tx_key.h
CI:
- unit-tests.yml: run container as root (--user root) so actions/checkout
  can write the runner file-commands; mirrors eloqkv build.yml. Fixes the
  EACCES that aborted the job at checkout.
- clang-format-18 the changed files to satisfy the lint.

CodeRabbit:
- MemDataStoreFactory: return nullptr if Initialize()/StartDB() fail.
- MemDataStore: DeleteRange/CreateTable/DropTable now honor read_only_.
- TestNode::BeginTx: assert InitTxRequest did not error (fail fast).
- CompositeKey::Type(): recognize the +/-inf sentinels (consistent with
  CloneTxKey), instead of always returning Normal.
- Port-window TOCTOU robustness: bring up the DSS + client first and reserve
  the TxService port window last (only the socket-free ctor runs before
  Start binds it); probe on INADDR_ANY to match how brpc binds the servers;
  retry DSS StartService / MemDataStore-Test service start on a lost bind.
  Verified 20/20 sequential + 10/10 parallel bring-ups.
- StartTsCollector sleeps: documented as intentional (collector sampling
  interval), the exemption already noted in the Phase 1 plan.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@liunyl

liunyl commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the review! Addressed in 1bd93e7 (+ earlier 45a0e7e). Summary:

CI failures (root causes):

  • unit-tests was dying at actions/checkout with EACCES writing /__w/_temp/_runner_file_commands/... — the container's default user can't write the runner temp. Fixed by running the container as root (options: --user root), matching eloqdata/eloqkv's build.yml.
  • build was the clang-format lint; reformatted the changed files with clang-format-18 + the repo .clang-format (matches the DoozyX clangFormatVersion: 18).

Review comments:

  • mem_data_store_factory.hCreateDataStore now returns nullptr if Initialize()/StartDB() fail (mirrors RocksDBDataStoreFactory).
  • mem_data_store.cppDeleteRange/CreateTable/DropTable now reject writes under read_only_ (return WRITE_TO_READ_ONLY_DB), consistent with BatchWriteRecords.
  • test_node.cpp:411BeginTx now assert(!init.IsError()) to fail fast.
  • tx_key.h:823CompositeKey::Type() now recognizes the ±∞ sentinels (consistent with CloneTxKey) instead of always returning Normal.
  • test_node.cpp:134 & MemDataStore-Test.cpp:43 (port TOCTOU) — restructured bring-up so the DSS + client come up first and the TxService port window is reserved last, leaving only the socket-free TxService ctor between probe and Start's bind; the probe now binds INADDR_ANY to match how brpc binds the servers (the old loopback probe didn't predict the 0.0.0.0 server bind); and DSS StartService / the MemDataStore-Test service start retry on a lost bind. (Full Start retry isn't possible — it initializes the process-wide Sharder singleton, which can't be cleanly re-Init'd — so reserving the window late is the fix.) Verified 20/20 sequential + 10/10 parallel bring-ups.
  • 💬 StartTsCollector-Test.cpp:62 (sleeps) — kept, with an explanatory comment. These are semantic waits for TxStartTsCollector's periodic sampling interval, not arbitrary sync; the "no fixed sleeps" criterion explicitly exempts StartTsCollector for exactly this reason.
  • StartTsCollector-Test.cpp:122 (restore gflags) — intentionally not applied. gflags::ParseCommandLineFlags(&argc,&argv,true) is what was causing failures: catch_discover_tests runs the test binary with Catch2's own flags at build time, and gflags strict-parse aborts on those unknown flags (deleting the executable and breaking the build). brpc runs fine on its flag defaults here, so removing the parse is required, not optional.
  • unit-tests.yml checkout hardening (pin SHA / persist-credentials: false) — left as @v4 to match the repo's existing build.yml/release.yml convention; happy to pin across all workflows in a separate change if preferred.

…coverage, comments)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tx_service/tests/include/mock/mock_catalog_factory.h (1)

166-174: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Rebuild and clone the KV catalog state instead of mutating it in place.

Line 173 deserializes into the existing kv_info_, but KVCatalogInfo::Deserialize() only clears kv_index_names_ for an empty blob. A second SetKVCatalogInfo() can therefore keep stale index mappings, and Clone() still recreates default KV info instead of preserving the deserialized state. That makes cloned/refreshed schemas diverge from the serialized catalog they were built from.

Suggested fix
     TableSchema::uptr Clone() const
     {
-        return std::make_unique<MockTableSchema>(
-            table_name_, schema_image_, version_);
+        auto clone = std::make_unique<MockTableSchema>(
+            table_name_, schema_image_, version_);
+        if (kv_info_ != nullptr)
+        {
+            clone->SetKVCatalogInfo(kv_info_->Serialize());
+        }
+        return clone;
     }
     void SetKVCatalogInfo(const std::string &kv_info_str) override
     {
-        if (kv_info_ == nullptr)
-        {
-            kv_info_ = std::make_unique<KVCatalogInfo>();
-        }
+        kv_info_ = std::make_unique<KVCatalogInfo>();
         size_t offset = 0;
         kv_info_->Deserialize(kv_info_str.data(), offset);
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tx_service/tests/include/mock/mock_catalog_factory.h` around lines 166 - 174,
SetKVCatalogInfo currently deserializes into the existing kv_info_ which can
leave stale state; instead, allocate a fresh KVCatalogInfo, call
KVCatalogInfo::Deserialize on that new instance (so it starts from a clean
slate), then replace kv_info_ with the new instance (e.g., kv_info_ =
std::move(new_info)); this ensures subsequent Clone() reads the exact
deserialized state rather than inheriting leftover mappings from the prior
object.
🧹 Nitpick comments (1)
tx_service/tests/MemDataStore-Test.cpp (1)

133-160: ⚡ Quick win

Assert that the request completion closure actually ran.

These helpers/tests only inspect result, but the contract you are trying to pin is that SetFinish ran synchronously. In the NO_ERROR paths a missing SetFinish can still read like success from a default-initialized CommonResult, so these tests can miss the regression they are supposed to catch.

Suggested hardening
 void Put(ServiceFixture &fx,
          std::string_view table,
          int32_t partition,
          std::string_view key,
          std::string_view value,
          uint64_t ts,
          remote::CommonResult &result)
 {
@@
     fx.service().BatchWriteRecords(table,
                                    partition,
                                    ServiceFixture::kShardId,
                                    key_parts,
                                    rec_parts,
                                    tss,
                                    ttls,
                                    ops,
                                    /*skip_wal=*/true,
                                    result,
                                    &done,
                                    /*parts_cnt_per_key=*/1,
                                    /*parts_cnt_per_record=*/1);
+    REQUIRE(done.ran_);
 }
@@
 int ReadKey(ServiceFixture &fx,
             std::string_view table,
             int32_t partition,
             std::string_view key,
             std::string &out_record,
             uint64_t &out_ts)
 {
@@
     fx.service().Read(table,
                       partition,
                       ServiceFixture::kShardId,
                       key,
                       &out_record,
                       &out_ts,
                       &out_ttl,
                       &result,
                       &done);
+    REQUIRE(done.ran_);
     return result.error_code();
 }
@@
 void Scan(ServiceFixture &fx,
           std::string_view table,
           int32_t partition,
           std::string_view start_key,
           std::string_view end_key,
@@
     fx.service().ScanNext(table,
                           partition,
                           ServiceFixture::kShardId,
                           start_key,
                           end_key,
                           inclusive_start,
                           inclusive_end,
                           scan_forward,
                           batch_size,
                           /*search_conditions=*/nullptr,
                           &items,
                           &session_id,
                           /*generate_session_id=*/false,
                           &result,
                           &done);
+    REQUIRE(done.ran_);
 }
@@
     fx.service().IncreaseWriteReqCount(ServiceFixture::kShardId);
     store.BatchWriteRecords(&req);
+    REQUIRE(done.ran_);

Also applies to: 165-248, 505-529

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tx_service/tests/MemDataStore-Test.cpp` around lines 133 - 160, The tests
call BatchWriteRecords using a NoopClosure and then immediately inspect the
CommonResult, which can miss whether the completion closure (SetFinish) actually
ran; replace the NoopClosure with an explicit completion closure that sets a
synchronization flag (e.g., a std::atomic<bool> or a
std::promise<void>/std::future) inside its Run/Invoke method and wait for that
flag/future to be signaled before asserting on result. Concretely, in the helper
Put (and the other similar helpers/tests noted) replace NoopClosure done with a
custom closure that captures a done_flag, calls done_flag.store(true) (or
promise.set_value()) when invoked, and after BatchWriteRecords wait for the
flag/future to be set (with a timeout) to ensure SetFinish ran before reading
CommonResult; reference the Put function, BatchWriteRecords call, NoopClosure,
and CommonResult when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tx_service/tests/harness/mem_data_store.cpp`:
- Around line 250-260: The test harness must preserve the real ScanNext
continuation contract instead of unconditionally calling req->ClearSessionId();
update MemDataStore::ScanNext so that if req->generate_session_id() or
req->has_session_id() is set it either (a) implements server-side continuation
by creating/looking up an iterator stored in scan_iter_cache_ keyed by
session_id (and sets a session_id in the response when generate_session_id() is
true) and resumes iteration from that iterator, or (b) if you choose not to
implement sessions, detect req->generate_session_id()/req->has_session_id() and
immediately return a clear error (fail fast) indicating server-side session
continuation is unsupported; do not silently drop the session_id via
req->ClearSessionId(). Ensure the changes touch the ScanNext handling around the
current req->ClearSessionId() call and update response session_id behavior
accordingly.

---

Outside diff comments:
In `@tx_service/tests/include/mock/mock_catalog_factory.h`:
- Around line 166-174: SetKVCatalogInfo currently deserializes into the existing
kv_info_ which can leave stale state; instead, allocate a fresh KVCatalogInfo,
call KVCatalogInfo::Deserialize on that new instance (so it starts from a clean
slate), then replace kv_info_ with the new instance (e.g., kv_info_ =
std::move(new_info)); this ensures subsequent Clone() reads the exact
deserialized state rather than inheriting leftover mappings from the prior
object.

---

Nitpick comments:
In `@tx_service/tests/MemDataStore-Test.cpp`:
- Around line 133-160: The tests call BatchWriteRecords using a NoopClosure and
then immediately inspect the CommonResult, which can miss whether the completion
closure (SetFinish) actually ran; replace the NoopClosure with an explicit
completion closure that sets a synchronization flag (e.g., a std::atomic<bool>
or a std::promise<void>/std::future) inside its Run/Invoke method and wait for
that flag/future to be signaled before asserting on result. Concretely, in the
helper Put (and the other similar helpers/tests noted) replace NoopClosure done
with a custom closure that captures a done_flag, calls done_flag.store(true) (or
promise.set_value()) when invoked, and after BatchWriteRecords wait for the
flag/future to be set (with a timeout) to ensure SetFinish ran before reading
CommonResult; reference the Put function, BatchWriteRecords call, NoopClosure,
and CommonResult when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 478ada08-b70b-4637-a027-7d04fae2005e

📥 Commits

Reviewing files that changed from the base of the PR and between 45a0e7e and c5fce28.

📒 Files selected for processing (11)
  • .github/workflows/unit-tests.yml
  • tx_service/include/tx_key.h
  • tx_service/tests/CcPage-Test.cpp
  • tx_service/tests/MemDataStore-Test.cpp
  • tx_service/tests/StartTsCollector-Test.cpp
  • tx_service/tests/TxConsistency-Test.cpp
  • tx_service/tests/harness/mem_data_store.cpp
  • tx_service/tests/harness/mem_data_store_factory.h
  • tx_service/tests/harness/test_node.cpp
  • tx_service/tests/harness/test_node.h
  • tx_service/tests/include/mock/mock_catalog_factory.h
🚧 Files skipped from review as they are similar to previous changes (8)
  • tx_service/include/tx_key.h
  • .github/workflows/unit-tests.yml
  • tx_service/tests/StartTsCollector-Test.cpp
  • tx_service/tests/CcPage-Test.cpp
  • tx_service/tests/harness/mem_data_store_factory.h
  • tx_service/tests/TxConsistency-Test.cpp
  • tx_service/tests/harness/test_node.cpp
  • tx_service/tests/harness/test_node.h

Comment thread tx_service/tests/harness/mem_data_store.cpp Outdated
…tract

Address review feedback that ScanNext's session-less re-seek diverges from the
real server-side-iterator scan contract. It does not: the production client
(SinglePartitionScanner::FetchNextBatch) always advances start_key to the last
returned key with inclusive_start=false, so re-seeking the ordered map from
start_key yields the same batch the server-side iterator would; the session_id
is only an iterator optimization. Fail-fast is not viable because the client
always sets generate_session_id=true.

- Clarify the ScanNext comment to state why session-less re-seek is contract-
  equivalent (and that a session-only resume without advancing start_key, which
  the client never does, would be the only divergence).
- Add a paginated multi-batch scan test (batch_size=2 over 5 keys) that drives
  ScanNext exactly as the client does (advance start_key, generate_session_id=
  true, carry session_id) and asserts a complete, in-order, gap-free result.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@liunyl liunyl merged commit e9899d5 into main Jun 13, 2026
5 checks passed
@liunyl liunyl deleted the feat/test-framework-phase1 branch June 13, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build a layered unit/cluster test framework for data_substrate (TestNode + TestCluster)

2 participants