Open-source log service, host manager, and tx-log-protos in-tree (BSL)#509
Conversation
Fold the previously-separate eloq_log_service (log service) and raft_host_manager (host manager) repos, plus the tx-log-protos submodule, directly into data_substrate as in-tree source. They are now covered by the repository's root Business Source License 2.0 (like eloqstore); the open-vs- proprietary build toggles are removed. - Absorb eloq_log_service/ and tx_service/raft_host_manager/ (were gitignored nested clones) and tx_service/tx-log-protos/ (was a submodule) as tracked files. Drop tx-log-protos's GPL/AGPL LICENSE (now under root BSL). - Remove the open-source log_service submodule and build_log_service.cmake. - Remove the OPEN_LOG_SERVICE option; always build the in-tree eloq_log_service via build_eloq_log_service.cmake under WITH_LOG_SERVICE. - Remove the FORK_HM_PROCESS option; the host manager is always built, installed, and forked. - Collapse the now-dead OPEN_LOG_SERVICE / FORK_HM_PROCESS preprocessor guards, keeping the eloq / forked code paths. - Update module docs (01, 06, 07, 08, 10, README) and CLAUDE.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughThe PR replaces prior log-service wiring with the in-tree ChangesIn-tree log service and host manager
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
|
Companion PR: eloqdata/eloqkv#512 (bumps this submodule + removes the matching CI plumbing). Merge order: this PR first, then eloqkv#512. |
There was a problem hiding this comment.
Actionable comments posted: 14
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
eloq_log_service/.clang-format (1)
1-47: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winUse the repository root
.clang-formatas the single source of formatting rules.Adding
eloq_log_service/.clang-formatcreates a subtree override and can diverge style from the required root configuration.As per coding guidelines, C++ code should use clang-format configuration from the repository root
.clang-format.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/.clang-format` around lines 1 - 47, Remove the `.clang-format` file from the eloq_log_service directory, as it creates a subtree override that can cause C++ formatting to diverge from the repository's root configuration. The repository should rely solely on the root `.clang-format` file as the single source of truth for formatting rules across all directories and subdirectories.Source: Coding guidelines
🟠 Major comments (37)
eloq_log_service/src/INIReader.cpp-52-53 (1)
52-53:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTyped getters do not enforce the documented fallback semantics.
GetStringdoes not treat whitespace-only values as empty, andGetInteger/GetRealaccept partial parses instead of requiring full numeric validity before returning parsed values.Suggested fix
string INIReader::GetString(const string §ion, const string &name, const string &default_value) const { const string str = Get(section, name, ""); - return str.empty() ? default_value : str; + const auto first = str.find_first_not_of(" \t\r\n"); + return (str.empty() || first == string::npos) ? default_value : str; } ... long INIReader::GetInteger(const string §ion, const string &name, long default_value) const { string valstr = Get(section, name, ""); const char *value = valstr.c_str(); char *end; long n = strtol(value, &end, 0); - return end > value ? n : default_value; + return (end > value && *end == '\0') ? n : default_value; } ... double INIReader::GetReal(const string §ion, const string &name, double default_value) const { string valstr = Get(section, name, ""); const char *value = valstr.c_str(); char *end; double n = strtod(value, &end); - return end > value ? n : default_value; + return (end > value && *end == '\0') ? n : default_value; }Also applies to: 64-66, 75-77
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/src/INIReader.cpp` around lines 52 - 53, The GetString, GetInteger, and GetReal methods do not enforce the documented fallback semantics. For GetString at lines 52-53, after obtaining the string value via Get(), you must trim whitespace from the result and treat whitespace-only strings as empty before returning the default value. For GetInteger at lines 64-66 and GetReal at lines 75-77, you must validate that the numeric parsing consumes the entire string (not just a partial parse) by checking that no non-whitespace characters remain after the parse operation; if the parse is incomplete or invalid, return the default value instead of the partially parsed result.eloq_log_service/CMakeLists.txt-284-287 (1)
284-287:⚠️ Potential issue | 🟠 Major | ⚡ Quick winCheck
protocexecution result during configure.
execute_process(...)ignores failure status; ifprotocfails, configure may continue and break later on generated source assumptions.Suggested fix
execute_process( - COMMAND protoc ./${PROTO_NAME}.proto --cpp_out=./ - WORKING_DIRECTORY ${PROTO_SRC} + COMMAND protoc ./${PROTO_NAME}.proto --cpp_out=./ + WORKING_DIRECTORY ${PROTO_SRC} + RESULT_VARIABLE PROTOC_RESULT + ERROR_VARIABLE PROTOC_STDERR ) +if(NOT PROTOC_RESULT EQUAL 0) + message(FATAL_ERROR "protoc failed for ${PROTO_NAME}.proto: ${PROTOC_STDERR}") +endif()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/CMakeLists.txt` around lines 284 - 287, The execute_process call for running protoc does not check whether the command succeeds or fails, which means CMake configuration will continue even if protoc fails to generate the required source files, causing build failures later. Add a RESULT_VARIABLE parameter to the execute_process call to capture the exit code, then check if the result is non-zero and use message(FATAL_ERROR ...) to halt the configuration with a clear error message if protoc execution fails. This ensures the configure step fails immediately when protoc cannot generate the necessary files rather than deferring the problem to the build phase.build_tx_service.cmake-234-247 (1)
234-247:⚠️ Potential issue | 🟠 Major | ⚡ Quick winResolve braft include path before composing host_manager include dirs.
HOST_MANAGER_INCLUDE_DIRis assembled using${BRAFT_INCLUDE_PATH}beforefind_path(BRAFT_INCLUDE_PATH ...), so the braft include path can be omitted from the include list.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@build_tx_service.cmake` around lines 234 - 247, The set(HOST_MANAGER_INCLUDE_DIR ...) command uses ${BRAFT_INCLUDE_PATH} before it has been defined by find_path(BRAFT_INCLUDE_PATH NAMES braft/raft.h). Move the find_path(BRAFT_INCLUDE_PATH...) call along with the validation block that checks if BRAFT_INCLUDE_PATH and BRAFT_LIB were found to execute before the set(HOST_MANAGER_INCLUDE_DIR...) call that references ${BRAFT_INCLUDE_PATH}. This ensures the BRAFT_INCLUDE_PATH variable is properly resolved before being added to the HOST_MANAGER_INCLUDE_DIR list.build_tx_service.cmake-262-270 (1)
262-270:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid loop-dependent proto source selection for
host_manager.
${PROTO_SRC}/${PROTO_NAME}.pb.ccdepends on the final value left by the earlierFOREACH(PROTO_FILE ...)loop, so the selected proto cc file is order-dependent and can miss required symbols.Use a deterministic explicit proto source (or a curated list) for
RaftHM_SOURCES.Suggested fix
SET(RaftHM_SOURCES ${HOST_MANAGER_SOURCE_DIR}/src/main.cpp ${HOST_MANAGER_SOURCE_DIR}/src/raft_host_manager_service.cpp ${HOST_MANAGER_SOURCE_DIR}/src/raft_host_manager.cpp ${HOST_MANAGER_SOURCE_DIR}/src/ini.c ${HOST_MANAGER_SOURCE_DIR}/src/INIReader.cpp - ${PROTO_SRC}/${PROTO_NAME}.pb.cc + ${PROTO_CC_FILES} ${LOG_PROTO_SRC}/log_agent.cpp ${LOG_PROTO_SRC}/${LOG_PROTO_NAME}.pb.cc )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@build_tx_service.cmake` around lines 262 - 270, The RaftHM_SOURCES variable uses ${PROTO_NAME}.pb.cc which is set by an earlier FOREACH loop over PROTO_FILE, making it order-dependent and fragile. Replace the loop-dependent ${PROTO_SRC}/${PROTO_NAME}.pb.cc reference in RaftHM_SOURCES with an explicit, deterministic proto source file name (either hardcode the specific proto file name that should be used for host_manager, or set a dedicated variable for the host_manager proto before the RaftHM_SOURCES definition) to ensure the correct proto symbols are always included regardless of loop iteration order.CMakeLists.txt-245-246 (1)
245-246:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard
logserviceinstall withWITH_LOG_SERVICE.
build_eloq_log_service.cmakeis included conditionally, butinstall(TARGETS logservice ...)is unconditional. With-DWITH_LOG_SERVICE=OFF, CMake can fail becauselogserviceis undefined.Suggested fix
-if(WITH_LOG_SERVICE) - INCLUDE(build_eloq_log_service.cmake) -endif() +if(WITH_LOG_SERVICE) + INCLUDE(build_eloq_log_service.cmake) +endif() ... -install(TARGETS logservice - ARCHIVE DESTINATION lib - LIBRARY DESTINATION lib) +if(WITH_LOG_SERVICE) + install(TARGETS logservice + ARCHIVE DESTINATION lib + LIBRARY DESTINATION lib) +endif()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CMakeLists.txt` around lines 245 - 246, The install(TARGETS logservice ...) command is being executed unconditionally, but the logservice target is only defined when the INCLUDE(build_eloq_log_service.cmake) line is executed inside the if() block. Move the install(TARGETS logservice ...) command inside the same if() block that guards the INCLUDE(build_eloq_log_service.cmake) statement, so that the install command only runs when WITH_LOG_SERVICE is enabled and the logservice target actually exists.eloq_log_service/include/replay_service_for_test.h-24-30 (1)
24-30:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard the controller downcast before dereference.
dynamic_castmay returnnullptr; dereferencing*cntlbefore validation can crash the RPC handler.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/include/replay_service_for_test.h` around lines 24 - 30, The dynamic_cast of controller to brpc::Controller pointer is not validated before being dereferenced in the brpc::StreamAccept call. Add a null check immediately after the dynamic_cast to verify that cntl is not nullptr, and if it is nullptr, handle the error appropriately (such as setting a failure status and logging an error) before attempting to dereference cntl with the * operator or pass it to any functions.eloq_log_service/include/log_state_rocksdb_impl.h-62-75 (1)
62-75:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
Deserializeneeds runtime key-length validation in release builds.
assert(key.size() == 20)disappears in optimized builds; corrupted keys will still be memcpy’d, causing out-of-bounds reads.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/include/log_state_rocksdb_impl.h` around lines 62 - 75, The key-length validation in the Deserialize method relies on assert(key.size() == 20), which is compiled out in release builds, leaving the subsequent memcpy operations vulnerable to out-of-bounds reads if a corrupted or incorrectly-sized key is passed. Replace the assert with an explicit runtime validation check (such as an if statement) that verifies key.size() equals 20 bytes and either returns an error code or throws an exception if the validation fails, ensuring this safety check persists across all build configurations.eloq_log_service/test/proto/test.proto-2-2 (1)
2-2:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPackage path does not satisfy Buf’s directory rule.
package txlogin this location violatesPACKAGE_DIRECTORY_MATCH; this will fail Buf lint unless the file is moved or package/directory conventions are aligned.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/test/proto/test.proto` at line 2, The package declaration `txlog` in test.proto does not match the directory structure where the file is located, violating Buf's PACKAGE_DIRECTORY_MATCH rule. Update the package statement to align with the directory hierarchy: the package name should reflect the path `eloq_log_service/test/proto/` (such as `eloq_log_service.test.proto`) to satisfy Buf's naming conventions and allow the linter to pass.Source: Linters/SAST tools
eloq_log_service/include/glog_error_logging.h-67-70 (1)
67-70:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid throwing exceptions from log directory setup.
std::filesystem::create_directoriescan throw; this init path should usestd::error_codeand fallback behavior instead of risking process termination.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/include/glog_error_logging.h` around lines 67 - 70, The std::filesystem::create_directories call in the directory existence check can throw an exception and terminate the process. Replace the call to create_directories(FLAGS_log_dir) with the overload that accepts a std::error_code reference parameter, then check the error code and implement appropriate fallback behavior (such as logging a warning or using an alternative log directory) instead of allowing the exception to propagate.eloq_log_service/include/log_state_rocksdb_impl.h-102-103 (1)
102-103:⚠️ Potential issue | 🟠 Major | ⚡ Quick winShard range initialization ignores the
start_tsfilter.Per-worker iterators are partitioned from global DB
first_ts, not from requestedstart_ts, so replay scans can include records older than requested.Also applies to: 121-133
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/include/log_state_rocksdb_impl.h` around lines 102 - 103, The initialization of first_ts and last_ts variables is hardcoded to 0, which ignores the start_ts filter parameter and causes replay scans to include records older than requested. Locate where first_ts and last_ts are initialized to 0 and change them to initialize from the start_ts parameter instead. This same fix needs to be applied at the additional location mentioned in the comment (around lines 121-133) where similar timestamp range initialization occurs. This ensures per-worker iterators respect the requested start_ts boundary rather than using the global database's first_ts.eloq_log_service/include/log_service_metrics.h-5-5 (1)
5-5:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse an atomic type for this global runtime flag.
A mutable global
boolcan race under concurrent reads/writes; metrics toggles are typically touched from multiple threads.Suggested fix
+#include <atomic> @@ -inline bool enable_log_service_metrics = false; +inline std::atomic_bool enable_log_service_metrics{false};🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/include/log_service_metrics.h` at line 5, The global variable enable_log_service_metrics is declared as a plain bool, which is not thread-safe for concurrent access from multiple threads and can lead to data races. Replace the bool type with std::atomic<bool> to ensure atomic operations and thread-safe access to this runtime flag that is shared across threads.eloq_log_service/include/rocksdb_cloud_config.h-166-180 (1)
166-180:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd a fallback when no cloud backend macro is selected.
statusis only defined in S3/GCS branches. WithLOG_STATE_TYPE_RKDB_CLOUDbut no backend macro, this function fails to compile.Suggested fix
inline rocksdb::Status NewCloudFileSystem( const rocksdb::CloudFileSystemOptions &cfs_options, rocksdb::CloudFileSystem **cfs) { `#if` defined(LOG_STATE_TYPE_RKDB_S3) @@ `#elif` defined(LOG_STATE_TYPE_RKDB_GCS) @@ +#else + return rocksdb::Status::InvalidArgument( + "No cloud backend selected: define LOG_STATE_TYPE_RKDB_S3 or LOG_STATE_TYPE_RKDB_GCS"); `#endif` - return status; + return status; };🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/include/rocksdb_cloud_config.h` around lines 166 - 180, The NewCloudFileSystem function declares the status variable only within the LOG_STATE_TYPE_RKDB_S3 and LOG_STATE_TYPE_RKDB_GCS preprocessor conditional branches, but the function unconditionally returns status at the end. When neither backend macro is defined, the status variable is undeclared, causing compilation failure. Add an `#else` clause after the `#elif` defined(LOG_STATE_TYPE_RKDB_GCS) block to initialize status with an appropriate error Status value (such as a Status indicating no cloud backend is configured) to ensure status is always defined before the return statement.tx_service/raft_host_manager/CMakeLists.txt-48-49 (1)
48-49:⚠️ Potential issue | 🟠 MajorDependency resolution is not enforcing what the script intends.
find_library(... VERSION ">=0.6.0" ...)does not enforce version constraints—CMake'sfind_librarycommand does not recognize or honor the VERSION parameter. Additionally,${OPENSSL_LIB}is used at line 109 but never defined (the correspondingfind_librarycall is commented out in the parent CMakeLists.txt), which will silently result in an empty variable during linking.Use
find_package(glog 0.6 CONFIG REQUIRED)instead to properly enforce version requirements and modern CMake targets (glog::glog,OpenSSL::SSL,OpenSSL::Crypto) for correct dependency management.Suggested fix
- find_library(GLOG_LIB NAMES glog VERSION ">=0.6.0" REQUIRED) + find_package(glog 0.6 CONFIG REQUIRED) @@ - set(LINK_LIB ${LINK_LIB} ${GLOG_LIB}) + set(LINK_LIB ${LINK_LIB} glog::glog) @@ - ${OPENSSL_LIB} + OpenSSL::SSL + OpenSSL::CryptoAlso applies to: 101-110
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tx_service/raft_host_manager/CMakeLists.txt` around lines 48 - 49, The find_library command for glog does not enforce version constraints because CMake's find_library does not recognize the VERSION parameter. Additionally, ${OPENSSL_LIB} is used but never defined since the corresponding find_library call is commented out. Replace the glog find_library call with find_package(glog 0.6 CONFIG REQUIRED) to properly enforce version requirements and use the modern CMake target glog::glog. Similarly, add find_package(OpenSSL REQUIRED) and use the modern CMake targets OpenSSL::SSL and OpenSSL::Crypto instead of the raw ${OPENSSL_LIB} variable to ensure correct dependency management and version enforcement throughout the CMakeLists.txt file.eloq_log_service/src/launch_server.cpp-17-31 (1)
17-31:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
run_direct()ignores all its parameters.The function accepts
ip,port,id,raft_conf,group,storage_pathparameters but constructsRaftServerusingFLAGS_*globals instead. This appears to be a copy-paste error.🐛 Proposed fix
void run_direct(std::string ip, uint32_t port, uint32_t id, std::string raft_conf, std::string group, std::string storage_path) { - txlog::RaftServer raftServer(FLAGS_ip, - FLAGS_port, - FLAGS_id, - FLAGS_raft_conf, - FLAGS_group, - FLAGS_storage_path); + txlog::RaftServer raftServer(ip, + port, + id, + raft_conf, + group, + storage_path); raftServer.Run(); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/src/launch_server.cpp` around lines 17 - 31, The run_direct() function accepts parameters (ip, port, id, raft_conf, group, storage_path) but ignores them by constructing RaftServer using FLAGS_* global variables instead. Replace each FLAGS_* reference in the RaftServer constructor call with its corresponding function parameter: FLAGS_ip becomes ip, FLAGS_port becomes port, FLAGS_id becomes id, FLAGS_raft_conf becomes raft_conf, FLAGS_group becomes group, and FLAGS_storage_path becomes storage_path.eloq_log_service/src/launch_sv.cpp-596-619 (1)
596-619:⚠️ Potential issue | 🟠 Major | ⚡ Quick winInvalid
log_purger_schedulecauses silent server non-start.When parsing fails, only an error is logged but
launch()is never called. The process appears to start normally but the server isn't running. Consider either returning an error code or exiting with failure.🛠️ Suggested fix
if (iss.fail()) { LOG(ERROR) << "The argument `log_purger_schedule` has invalid time format. " "expected: HH:MM:SS"; + if (!FLAGS_alsologtostderr) + { + std::cout << "Failed to start LogServer, invalid log_purger_schedule format" + << std::endl; + } + return -1; } - else - { + rocksdb_cloud_config.log_purger_starting_hour_ = log_purger_tm.tm_hour; // ... rest of launch logic - }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/src/launch_sv.cpp` around lines 596 - 619, When parsing of log_purger_schedule fails in the if (iss.fail()) block, the code only logs an error but never calls launch(), causing the server to not actually start while appearing successful. After logging the error for the invalid time format, add an explicit error handling mechanism such as returning an error code from the function or calling a process exit function with a failure status to prevent silent server non-start. Ensure the else block containing the successful launch() call remains unchanged.eloq_log_service/include/log_utils.h-48-53 (1)
48-53:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd suffix-length guard before offset math in
ends_with.At Line 51,
str.size() - suffix.size()is unsafe whensuffixis longer thanstr. Add an earlyif (str.size() < suffix.size()) return false;.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/include/log_utils.h` around lines 48 - 53, The ends_with function performs unsafe arithmetic on unsigned values when calculating the offset for str.compare(). Add a guard condition at the beginning of the ends_with function that returns false immediately if str.size() is less than suffix.size(). This prevents integer underflow when suffix is longer than str, ensuring the subsequent str.size() - suffix.size() operation is safe.eloq_log_service/src/log_state_rocksdb_impl.cpp-250-295 (1)
250-295:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRelease checkpoint object after snapshot creation.
checkpointallocated at Line 250/251 is never released in this function. Repeated snapshots will leak memory/resources.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/src/log_state_rocksdb_impl.cpp` around lines 250 - 295, The checkpoint object created by rocksdb::Checkpoint::Create is never released before the function returns, causing a memory leak on repeated snapshot calls. Add a delete statement to release the checkpoint pointer before the return res statement at the end of the function, ensuring the allocated checkpoint resource is properly cleaned up.eloq_log_service/src/log_state_rocksdb_cloud_impl.cpp-952-977 (1)
952-977:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftDo not call
observer_->OnInMemStateFullwhile holdingin_mem_state_mutex_.At Line 974, external callback dispatch occurs under
in_mem_state_mutex_. If observer execution is synchronous or re-entrant, this can deadlock or significantly stall writers.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/src/log_state_rocksdb_cloud_impl.cpp` around lines 952 - 977, The observer callback `OnInMemStateFull` is being invoked while holding the `in_mem_state_mutex_` lock, which can cause deadlock or writer stalling if the callback is synchronous or re-entrant. Release the lock by ending the unique_lock scope before calling the observer callback. Capture the necessary parameter values (such as `log_count_before_purge_`) into local variables while still holding the lock, then release the lock before passing those values to `observer_->OnInMemStateFull`. The lambda callback `done` can continue to acquire the lock when needed to update state after the purge completes.eloq_log_service/src/log_state_rocksdb_cloud_impl.cpp-1791-1798 (1)
1791-1798:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFree
SstFileReaderiterators in all paths during refill.The iterator created at Line 1791 is never deleted, including early-return error paths. Refill over many files will leak iterators and degrade long-running stability.
Also applies to: 1800-1815
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/src/log_state_rocksdb_cloud_impl.cpp` around lines 1791 - 1798, The iterator created from sst_reader.NewIterator at line 1791 is never deleted, causing a memory leak especially in the error path where the function returns false. You must ensure the iterator is properly freed in all code paths within the refill operation, including the error handling block at lines 1794-1797 and any other paths up to line 1815. Use RAII pattern with a scoped cleanup mechanism or add explicit delete statements before every return path to guarantee the iterator pointed to by it is deallocated. Identify all locations where iterators are created and returned without cleanup, and apply consistent cleanup logic across all affected code paths in the refill operation.eloq_log_service/include/log_utils.h-23-27 (1)
23-27:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAlign size validation with integer-only byte-size parsing.
is_numbercurrently accepts decimal/scientific/signed input, butparse_sizetreats input as integer capacity. Values like1.5GBor negative values pass validation and can be parsed incorrectly.Also applies to: 59-81, 101-113
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/include/log_utils.h` around lines 23 - 27, The is_number function's regex pattern accepts decimal values, scientific notation, and signed numbers, but parse_size (referenced in the affected ranges at lines 59-81 and 101-113) treats input as integer-only byte capacity, causing values like "1.5GB" or negative numbers to incorrectly pass validation. Update the regex pattern in is_number to match only unsigned integer formats (digits only, no decimal point, no scientific notation, no sign). Then verify that parse_size and any related size parsing logic at the consolidated sites (lines 59-81, 101-113) correctly handle the restricted input format and reject non-integer or negative inputs appropriately.eloq_log_service/src/fault_inject.cpp-17-19 (1)
17-19:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix inverted strike-window condition in trigger gating.
Line 17-18 rejects the in-range case. The guard should return when
count_strike_is outside[start_strike_, end_strike_], not inside it.Proposed fix
- if (entry->start_strike_ < entry->count_strike_ || - entry->end_strike_ > entry->count_strike_) + if (entry->count_strike_ < entry->start_strike_ || + entry->count_strike_ > entry->end_strike_) { return; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/src/fault_inject.cpp` around lines 17 - 19, The condition on lines 17-18 has inverted inequality comparisons that cause it to reject when count_strike_ is inside the range [start_strike_, end_strike_] instead of outside. Fix the guard condition by inverting the comparisons: change "entry->start_strike_ < entry->count_strike_" to "entry->count_strike_ < entry->start_strike_" and change "entry->end_strike_ > entry->count_strike_" to "entry->count_strike_ > entry->end_strike_" so the condition correctly returns when count_strike_ falls outside the valid strike window.tx_service/raft_host_manager/include/INIReader.h-41-106 (1)
41-106:⚠️ Potential issue | 🟠 MajorWrap INIReader in module-specific namespaces to comply with coding guidelines and prevent symbol collisions.
INIReaderis declared in global scope, but your codebase includes three separate implementations:
tx_service/raft_host_manager/include/INIReader.heloq_log_service/include/INIReader.hstore_handler/eloq_data_store_service/INIReader.hAll three are compiled into the same
data_substratelibrary, creating an ODR (One Definition Rule) violation risk. Your CMakeLists.txt acknowledges this ("The INIReader definition in the eloqstore library was renamed to avoid clashing with the other INIReader implementations"), but the actual code still declares all three in global scope.Per coding guidelines, C++ code must be organized within namespaces. Move each
INIReaderclass into its respective namespace (e.g.,namespace txservice,namespace eloq_log,namespace eloq_data_store) to eliminate the collision risk and comply with project standards.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tx_service/raft_host_manager/include/INIReader.h` around lines 41 - 106, The INIReader class is declared in global scope across three separate implementations in different parts of the codebase, creating an ODR (One Definition Rule) violation risk when all three are compiled into the same library. Wrap the INIReader class declaration in appropriate module-specific namespaces (e.g., txservice for the raft_host_manager, eloq_log for the eloq_log_service, and eloq_data_store for the store_handler implementation) to prevent symbol collisions and comply with coding guidelines. Ensure that the namespace wrapping is applied consistently across all three INIReader implementations in their respective headers.Source: Coding guidelines
eloq_log_service/src/log_state_memory_impl.cpp-88-110 (1)
88-110:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse binary mode and check stream state in snapshot I/O paths.
Line [92] and Line [105] open snapshot files in default text mode, but the format is binary (
read/writeon integers and raw bytes). Also, read/write results are not validated, so short I/O can silently produce corrupted in-memory state.Also applies to: 136-170, 173-207
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/src/log_state_memory_impl.cpp` around lines 88 - 110, The ReadSnapshot and WriteSnapshot methods in LogStateMemoryImpl are opening snapshot files in text mode but performing binary I/O operations (reading/writing integers and raw bytes), which can cause data corruption. Additionally, the results of read/write operations are not being validated, allowing silent failures. Fix this by opening both ifstream in ReadSnapshot and ofstream in WriteSnapshot with the std::ios::binary flag, and after each load/write operation, check the stream state using fail() or similar validation to ensure the I/O completed successfully before proceeding. If stream validation fails, return an error code or log the failure appropriately.eloq_log_service/include/log_state.h-1241-1360 (1)
1241-1360:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftFail fast on snapshot read/parse errors before mutating in-memory state.
The deserialization path reads lengths/messages and parses protobuf payloads without checking
istreamstate orParseFromStringresults. On truncated/corrupt snapshots, this can partially load invalid state and poison recovery.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/include/log_state.h` around lines 1241 - 1360, The snapshot deserialization code reads binary data and parses protobuf messages without validating the input stream state or parse results before inserting into in-memory state containers. Add error checking after each is.read() call to verify the stream is in a good state, and check the return value of each ParseFromString() call to ensure successful deserialization. If any read fails (stream state is bad) or if ParseFromString returns false, log the error and exit the deserialization immediately before calling try_emplace() on cc_ng_info_, tx_catalog_ops_, tx_split_range_ops_, or tx_cluster_scale_ops_, to prevent poisoning recovery with partially-loaded or corrupted data.eloq_log_service/include/log_service.h-345-348 (1)
345-348:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDo not log cloud credentials in plaintext.
Line 347 logs
aws_secret_keydirectly. This leaks secrets to service logs.Suggested fix
`#if` defined(LOG_STATE_TYPE_RKDB_S3) << ", aws_access_key_id = " << aws_access_key_id - << ", aws_secret_key = " << aws_secret_key + << ", aws_secret_key = [REDACTED]" `#endif`🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/include/log_service.h` around lines 345 - 348, The aws_secret_key is being logged in plaintext within the LOG_STATE_TYPE_RKDB_S3 conditional block, which is a security vulnerability as it exposes sensitive credentials to service logs. Replace the plaintext logging of aws_secret_key with a redacted or masked representation such as a placeholder string like "[REDACTED]" or a truncated version that does not expose the actual secret value.eloq_log_service/include/log_service.h-386-393 (1)
386-393:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
node_idvalidation can be bypassed by integer narrowing.On Line 386, casting
node_idtouint16_tcan wrap large invalid values and let them pass the check, then fail later with out-of-range access.Suggested fix
- if ((uint16_t) node_id >= ip_port_list.size()) + if (node_id < 0 || + static_cast<size_t>(node_id) >= ip_port_list.size()) { std::string err_msg = "Invalid configuration: `node_id` must be less than node size, "🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/include/log_service.h` around lines 386 - 393, The validation check on line 386 uses an unsafe cast that narrows the `node_id` value to `uint16_t` before comparison, allowing large invalid values to wrap around and bypass the bounds check, leading to out-of-range access later. Remove the cast to `uint16_t` in the condition and compare `node_id` directly against `ip_port_list.size()` so that invalid values are properly caught without integer wrapping.eloq_log_service/include/log_service.h-910-918 (1)
910-918:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
Start()may report success after partial startup failure.Lines 913-916 overwrite
err_codeper replica. If an early replica fails and a later one succeeds, this returns0and masks failure.Suggested fix
int Start() { - int err_code = 0; for (auto &logger_pair : log_replicas_) { - err_code = logger_pair.second.Start(); + int err_code = logger_pair.second.Start(); + if (err_code != 0) + { + return err_code; + } } - - return err_code; + return 0; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/include/log_service.h` around lines 910 - 918, The Start() method overwrites the err_code variable in each iteration of the loop over log_replicas_, which causes it to report success if the last replica succeeds even if earlier replicas failed. Instead of unconditionally overwriting err_code with the result of each logger_pair.second.Start() call, modify the loop to preserve error codes from earlier replicas. Either accumulate errors using a logical OR operation (err_code = err_code || logger_pair.second.Start()) so that any failure is preserved, or use early exit logic (if (err_code == 0) { err_code = logger_pair.second.Start(); }) to return the first encountered error. This ensures the method only returns success if all replicas start successfully.eloq_log_service/include/log_shipping_agent.h-79-94 (1)
79-94:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRetry loop never re-attempts channel initialization.
On Line 79,
while (err != 0)loops on a staleerrvalue;channel_.Init(...)is never called again inside the loop, so a transient failure becomes a permanent sleep loop.Suggested fix
- while (err != 0) + auto init_channel = [&]() -> int + { + if (0 != butil::str2ip(node_ip_str.c_str(), &ip_t)) + { + std::string naming_service_url; + braft::HostNameAddr hostname_addr(node_ip_str, node_port); + braft::HostNameAddr2NSUrl(hostname_addr, naming_service_url); + return channel_.Init(naming_service_url.c_str(), + braft::LOAD_BALANCER_NAME, + &options); + } + return channel_.Init(full_ip_.c_str(), &options); + }; + + while (err != 0) { if (shipping_agent_status_.load( std::memory_order_acquire) != Status::Active) @@ LOG(ERROR) << "Log shipping agent of the log group #" << log_group_id_ << " fails to connect to the cc node at " << full_ip_; using namespace std::chrono_literals; std::this_thread::sleep_for(1s); + err = init_channel(); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/include/log_shipping_agent.h` around lines 79 - 94, The retry loop conditional on `while (err != 0)` continuously checks a stale error value without re-attempting the initialization. The `channel_.Init(...)` method call must be invoked again inside the loop body after the sleep to actually retry the connection; otherwise, the loop will sleep indefinitely without ever retrying. Move the channel initialization call (that sets the `err` value) into the loop so that each iteration attempts to re-initialize the channel and updates the `err` variable before the next loop condition check.eloq_log_service/test/launch_replay_service.cpp-32-35 (1)
32-35:⚠️ Potential issue | 🟠 MajorParse command-line flags and propagate server startup status from
main.The
mainfunction currently ignores both CLI flag parsing and the return code fromstart_replay_rpc_server(). This meansFLAGS_replay_portoverrides passed on the command line won't take effect (only the default value 8888 is used), and startup failures will still cause the process to exit successfully.Suggested fix
int main(int argc, char *argv[]) { - start_replay_rpc_server(); + GFLAGS_NAMESPACE::ParseCommandLineFlags(&argc, &argv, true); + return start_replay_rpc_server(); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/test/launch_replay_service.cpp` around lines 32 - 35, The main function needs to parse command-line flags before starting the server and propagate the server startup status as the process exit code. Add a call to parse command-line arguments (such as gflags::ParseCommandLineFlags) at the beginning of main to ensure CLI flags like FLAGS_replay_port are processed correctly, then capture the return value from start_replay_rpc_server() and use it as the return code for main so that startup failures result in a non-zero exit code.eloq_log_service/test/rocksdb_cloud_delete_range_test.cpp-300-302 (1)
300-302:⚠️ Potential issue | 🟠 MajorCheck and handle
DeleteFilesInRangefailures.The function returns a
rocksdb::Statusthat must be checked. If deletion fails, the test continues with misleading results. The file already handles Status checks consistently (see lines 224–230 and 253–257).Suggested fix
- rocksdb::DeleteFilesInRange( - db, db->DefaultColumnFamily(), &lower_bound, &upper_bound); + auto delete_status = rocksdb::DeleteFilesInRange( + db, db->DefaultColumnFamily(), &lower_bound, &upper_bound); + if (!delete_status.ok()) + { + LOG(ERROR) << "DeleteFilesInRange failed for ng_id " << ng_id + << ": " << delete_status.ToString(); + return -1; + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/test/rocksdb_cloud_delete_range_test.cpp` around lines 300 - 302, The rocksdb::DeleteFilesInRange call is not checking the returned rocksdb::Status, which means if deletion fails the test continues with invalid results. Capture the Status returned by DeleteFilesInRange and check it for errors following the existing pattern already used in the file at lines 224–230 and 253–257. Handle any error appropriately (typically by asserting or returning early from the test) to prevent misleading test results.eloq_log_service/test/log_server_rocksdb_cloud_tests.cpp-151-164 (1)
151-164:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSynchronize captured replay messages instead of relying on sleeps.
on_received_messages()mutatesreplay_messages_on BRPC stream callbacks while the test thread reads and clears the same vector after fixed sleeps. Guard the vector with a mutex or signal replay completion before assertions/clears to avoid data races and flaky results.Also applies to: 1105-1202, 1270-1306, 1389-1454, 1536-1591, 1670-1696
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/test/log_server_rocksdb_cloud_tests.cpp` around lines 151 - 164, The on_received_messages() method mutates the replay_messages_ vector from a BRPC stream callback thread while the test thread reads and clears it after fixed sleeps, creating a data race. Add a mutex to guard all accesses to replay_messages_ in the on_received_messages() method (at lines 151-164 anchor) and at all other locations where replay_messages_ is accessed: lines 1105-1202, 1270-1306, 1389-1454, 1536-1591, and 1670-1696. Additionally, consider adding a condition variable to signal when replay is complete instead of relying on fixed sleeps, ensuring proper synchronization between the BRPC callback thread and the test thread before assertions or clears on the shared vector.eloq_log_service/src/log_instance.cpp-636-699 (1)
636-699:⚠️ Potential issue | 🟠 Major | ⚡ Quick winInclude cluster-scale and migration logs in retry success detection.
The retry path only re-checks DataLog/SchemaLog/SplitRangeLog. A previously committed
kClusterScaleLogorkMigrationLogretried after a term mismatch will fall through asFail, even though normal execution later treats those log types as duplicate-aware writes. Add searches forSearchTxClusterScaleOp()andSearchTxDataMigrationOp()here before returning failure.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/src/log_instance.cpp` around lines 636 - 699, The switch statement on log_content.content_case() is missing handlers for kClusterScaleLog and kMigrationLog log types. When these log types are retried after a term mismatch, they fall through to the default case and are incorrectly marked as failures. Add two new cases before the default case: one for LogContentMessage::ContentCase::kClusterScaleLog that calls log_state_->SearchTxClusterScaleOp(tx_number) and sets committed based on whether the result indicates the operation was previously found and flushed at the appropriate stage, and another for LogContentMessage::ContentCase::kMigrationLog that calls log_state_->SearchTxDataMigrationOp(tx_number) with the same pattern. Follow the same structure as the existing kSchemaLog and kSplitRangeLog cases, extracting the found boolean and flushed stage, then comparing against the current stage from log_content to set the committed flag.eloq_log_service/test/log_server_rocksdb_cloud_tests.cpp-410-435 (1)
410-435:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRequire at least one replayed record in
VerifyReplayMessages().A finish-only replay message has empty
binary_log_records()and currently passes without checkingverify_commit_tsorverify_message. Count decoded records andCATCH_REQUIRE(record_count > 0)so data-loss regressions cannot pass as successful replay.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/test/log_server_rocksdb_cloud_tests.cpp` around lines 410 - 435, The VerifyReplayMessages() function currently allows finish-only replay messages with empty binary_log_records to pass verification without checking verify_commit_ts or verify_message. Add a counter variable initialized to zero before the if statement that checks log_records.size(), increment this counter each time a record is successfully decoded in the while loop (after the blob_length offset is processed), and then add a CATCH_REQUIRE assertion after the if block to ensure record_count is greater than zero, so that data-loss regressions cannot pass as successful replay.eloq_log_service/src/log_instance.cpp-1207-1295 (1)
1207-1295:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAlways clean snapshot state on cloud snapshot failures.
BeginSnapshot()runs before the async save, but the RocksDB Cloud error returns skip Line 1308, leaving snapshot state uncleared after flush/serialization/proposal failures. Use an RAII cleanup guard or a single cleanup exit path soCleanSnapshotState()runs on every return afterWriteSnapshot().🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/src/log_instance.cpp` around lines 1207 - 1295, The snapshot state initialized by BeginSnapshot() is not being cleaned up when errors occur during the RocksDB Cloud flush, serialization, or proposal operations. Multiple return paths at different error points (CheckOrWaitForMemDBInSync failure, Flush failure, SerializeToZeroCopyStream failure, and ProposeTaskAndWait failure) skip the CleanSnapshotState() call. Create an RAII cleanup guard that wraps the scope after BeginSnapshot() and ensures CleanSnapshotState() is called in its destructor on all error returns, or consolidate all error handling to a single cleanup exit path before the function returns.eloq_log_service/test/recover_time_test.cpp-143-145 (1)
143-145:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard latency summaries when no writes succeed.
Both averages divide by
cnt; when the cluster is unavailable or no sender succeeds, this becomes an integer divide-by-zero during failure reporting.Suggested guard
- << (time_us / cnt) << " microseconds"; + << (cnt == 0 ? 0 : time_us / cnt) << " microseconds";Apply the same guard to the aggregate summary.
Also applies to: 199-202
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/test/recover_time_test.cpp` around lines 143 - 145, The latency summary logging statements perform integer division by cnt without checking if cnt is zero, which causes a divide-by-zero error when the cluster is unavailable or no sender succeeds. Add a guard condition to check if cnt is greater than zero before computing and logging the average response time. Apply this guard at the LOG(INFO) statement around line 143-145 (the aggregate summary for write log requests) and also at the corresponding location around line 199-202 to protect both latency summary calculations from division-by-zero failures.eloq_log_service/src/log_instance.cpp-1122-1124 (1)
1122-1124:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPut finite timeouts on synchronous BRPC calls.
These RPCs run without a controller timeout, so a stalled peer can block leader notification, the replay-size checker, or peer-change redirection indefinitely;
on_leader_stop()then may hang while joining the checker thread. Set a bounded timeout on each controller before the synchronous call.Suggested shape
brpc::Controller cntl; -cntl.set_timeout_ms(-1); +cntl.set_timeout_ms(10000); stub.UpdateLogGroupLeader(&cntl, &req, &res, nullptr);Apply the same pattern to
NotifyCheckpointer,AddPeer, andRemovePeer.Also applies to: 1650-1654, 1755-1757, 1846-1848
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/src/log_instance.cpp` around lines 1122 - 1124, Replace the timeout value of -1 with a bounded positive timeout value in all synchronous BRPC controller calls to prevent indefinite blocking. In eloq_log_service/src/log_instance.cpp at lines 1122-1124 (UpdateLogGroupLeader call), change cntl.set_timeout_ms(-1) to set a finite timeout such as cntl.set_timeout_ms(5000) or an appropriate bounded value. Apply the same pattern to the consolidated sibling locations at lines 1650-1654 (NotifyCheckpointer call), lines 1755-1757 (AddPeer call), and lines 1846-1848 (RemovePeer call), ensuring each controller has a bounded timeout before its corresponding synchronous RPC call to prevent stalled peers from blocking indefinitely.eloq_log_service/test/recover_time_test.cpp-38-40 (1)
38-40:⚠️ Potential issue | 🟠 MajorMake the random generator thread-local or synchronized.
All sender threads call
generate_log_request(), which mutates the globalstd::default_random_engineviadistribution(generator). Sincestd::default_random_engineis not thread-safe, concurrent access from multiple threads (FLAGS_thread_num, default 48) creates a data race and undefined behavior. Use athread_localengine in the sender or protect the generator with a mutex.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/test/recover_time_test.cpp` around lines 38 - 40, The global std::default_random_engine generator declared in the test is accessed concurrently by multiple sender threads that call generate_log_request(), which mutates the generator's state via distribution(generator). Since std::default_random_engine is not thread-safe, this creates a data race. Fix this by making the generator thread_local so each thread has its own isolated copy, eliminating concurrent access to the same generator instance. Alternatively, protect the generator with a mutex around distribution(generator) calls if a single shared sequence is required.
| uint32_t LatestCommittedTxnNumber(uint32_t cc_ng) const | ||
| { | ||
| auto it = cc_ng_info_.find(cc_ng); | ||
| if (it == cc_ng_info_.end()) | ||
| { | ||
| return 0; | ||
| } | ||
| return it->second.latest_txn_no_; | ||
| } | ||
|
|
||
| void UpdateLatestCommittedTxnNumber(uint32_t tx_cc_ng, uint32_t tx_ident) | ||
| { | ||
| // access different fields of node group info with RecoverTx RPC | ||
| // thread, no need to lock | ||
| auto it = cc_ng_info_.find(tx_cc_ng); | ||
| if (it == cc_ng_info_.end()) | ||
| { | ||
| return; | ||
| } | ||
| CcNgInfo &info = it->second; | ||
|
|
||
| // to handle the situation that committed txn number wraps around | ||
| // uint32, assuming that active txn numbers won't span half of | ||
| // UINT32_MAX | ||
| if (tx_ident - info.latest_txn_no_ < (UINT32_MAX >> 1)) | ||
| { | ||
| info.latest_txn_no_ = tx_ident; | ||
| } | ||
| } | ||
|
|
||
| uint64_t LatestCommitTsOfAllNodeGroups() const | ||
| { | ||
| uint64_t latest_commit_ts = 0; | ||
| for (const auto &entry : cc_ng_info_) | ||
| { | ||
| latest_commit_ts = | ||
| std::max(latest_commit_ts, entry.second.latest_commit_ts_); | ||
| } | ||
| return latest_commit_ts; | ||
| } | ||
|
|
||
| void UpdateLatestCommitTs(uint32_t tx_cc_ng, uint64_t commit_ts) | ||
| { | ||
| // access different fields of node group info with RecoverTx RPC | ||
| // thread, no need to lock | ||
| auto it = cc_ng_info_.find(tx_cc_ng); | ||
| if (it == cc_ng_info_.end()) | ||
| { | ||
| return; | ||
| } | ||
| CcNgInfo &info = it->second; | ||
|
|
||
| if (commit_ts != 0 && commit_ts > info.latest_commit_ts_) | ||
| { | ||
| info.latest_commit_ts_ = commit_ts; | ||
| } | ||
| } |
There was a problem hiding this comment.
Guard all cc_ng_info_ accesses with log_state_mutex_.
Line [991], Line [1005], Line [1024], and Line [1036] access cc_ng_info_ without locking, while other paths mutate the same map under std::unique_lock (for example UpdateNgTerm/RemoveCcNodeGroup). Concurrent read/write on std::unordered_map is undefined behavior and can crash or corrupt state.
Suggested fix
uint32_t LatestCommittedTxnNumber(uint32_t cc_ng) const
{
+ std::shared_lock s_lk(log_state_mutex_);
auto it = cc_ng_info_.find(cc_ng);
if (it == cc_ng_info_.end())
{
return 0;
}
return it->second.latest_txn_no_;
}
void UpdateLatestCommittedTxnNumber(uint32_t tx_cc_ng, uint32_t tx_ident)
{
+ std::unique_lock lk(log_state_mutex_);
auto it = cc_ng_info_.find(tx_cc_ng);
if (it == cc_ng_info_.end())
{
return;
}
...
}
uint64_t LatestCommitTsOfAllNodeGroups() const
{
+ std::shared_lock s_lk(log_state_mutex_);
uint64_t latest_commit_ts = 0;
...
}
void UpdateLatestCommitTs(uint32_t tx_cc_ng, uint64_t commit_ts)
{
+ std::unique_lock lk(log_state_mutex_);
auto it = cc_ng_info_.find(tx_cc_ng);
...
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@eloq_log_service/include/log_state.h` around lines 991 - 1047, The methods
LatestCommittedTxnNumber, UpdateLatestCommittedTxnNumber,
LatestCommitTsOfAllNodeGroups, and UpdateLatestCommitTs all access the
cc_ng_info_ member map without holding log_state_mutex_, while other code paths
like UpdateNgTerm and RemoveCcNodeGroup mutate this map under lock, creating a
race condition. Guard all accesses to cc_ng_info_ in each of these four methods
by wrapping the relevant code with a std::unique_lock on log_state_mutex_ at the
beginning of each method to protect both the map lookups and subsequent field
accesses.
The log service, host manager, and store handler each carried an identical
copy of the inih/INIReader library. Move one canonical copy to
data_substrate/third_party/ini (BSD-3-Clause, kept separate from the BSL
first-party code) and point every build target at it.
- Add data_substrate/third_party/ini/{ini.c,ini.h,INIReader.cpp,INIReader.h}.
- Repoint host_manager, logservice, and the data_substrate core library, plus
the standalone tx_service / eloq_log_service / store_handler / raft_host_manager
builds, at third_party/ini via CMAKE_CURRENT_LIST_DIR-relative paths.
- Delete the three duplicate copies (12 files).
Compile-per-target is preserved, so cross-library INIReader symbol resolution
is unchanged. eloqkv keeps its own copy (separate repo).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
store_handler/eloq_data_store_service/CMakeLists.txt (1)
406-412:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix malformed
include_directoriesblock around Line 408.
if()/endif()is currently insideinclude_directories(...), so the conditional is parsed as arguments instead of control flow. Move the conditional outside the call.Suggested fix
-include_directories( - ${PROJECT_SOURCE_DIR}/ - ${CMAKE_CURRENT_LIST_DIR}/../../third_party/ini -if(WITH_DATA_STORE STREQUAL "ELOQDSS_ELOQSTORE") - ${PROJECT_SOURCE_DIR}/eloqstore/ -endif () -) +include_directories( + ${PROJECT_SOURCE_DIR}/ + ${CMAKE_CURRENT_LIST_DIR}/../../third_party/ini +) + +if(WITH_DATA_STORE STREQUAL "ELOQDSS_ELOQSTORE") + include_directories(${PROJECT_SOURCE_DIR}/eloqstore/) +endif()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@store_handler/eloq_data_store_service/CMakeLists.txt` around lines 406 - 412, The include_directories block has an if()/endif() conditional incorrectly nested inside the function call, causing CMake to parse it as arguments rather than control flow. Close the initial include_directories() call after the ${CMAKE_CURRENT_LIST_DIR}/../../third_party/ini line, then place the if(WITH_DATA_STORE STREQUAL "ELOQDSS_ELOQSTORE") conditional outside and after the closing parenthesis. Inside the if block, create a separate include_directories() call containing just the ${PROJECT_SOURCE_DIR}/eloqstore/ path, followed by the endif() statement.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@store_handler/eloq_data_store_service/CMakeLists.txt`:
- Around line 406-412: The include_directories block has an if()/endif()
conditional incorrectly nested inside the function call, causing CMake to parse
it as arguments rather than control flow. Close the initial
include_directories() call after the
${CMAKE_CURRENT_LIST_DIR}/../../third_party/ini line, then place the
if(WITH_DATA_STORE STREQUAL "ELOQDSS_ELOQSTORE") conditional outside and after
the closing parenthesis. Inside the if block, create a separate
include_directories() call containing just the ${PROJECT_SOURCE_DIR}/eloqstore/
path, followed by the endif() statement.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c1d2eddf-241c-4635-9e49-0419fe081d83
📒 Files selected for processing (11)
CMakeLists.txtbuild_eloq_log_service.cmakebuild_tx_service.cmakeeloq_log_service/CMakeLists.txtstore_handler/eloq_data_store_service/CMakeLists.txtthird_party/ini/INIReader.cppthird_party/ini/INIReader.hthird_party/ini/ini.cthird_party/ini/ini.htx_service/CMakeLists.txttx_service/raft_host_manager/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (3)
- CMakeLists.txt
- build_tx_service.cmake
- eloq_log_service/CMakeLists.txt
Triage and fix the actionable findings from the PR #509 review across eloq_log_service, the host-manager build, and the consolidated cmake. Findings that were speculative or by-design were verified and intentionally left as-is (single-threaded raft-apply paths, async braft snapshot callback, fire-and-forget control-plane RPCs, Buf lint where Buf is unused, etc.). Correctness/safety fixes: - glog init: handle readlink failure (UB) and use error_code for create_directories. - fault_inject: FaultEntry use-after-free (store shared_ptr); inverted strike-window guard. - resource leaks: rocksdb Checkpoint and SstFileReader iterator (unique_ptr); CleanSnapshotState on cloud-snapshot error paths. - snapshot deserialize: validate stream state + ParseFromString; open snapshot files binary. - log_service: redact aws_secret_key; fix node_id narrowing bypass; Start() no longer masks an early replica failure. - log_shipping_agent: retry loop now re-initializes the channel. - log_utils: ends_with underflow guard; integer-only size validation. - log_state_rocksdb_impl: guard worker_num==0; runtime key-length check; shard range honors start_ts. - metrics flag made std::atomic; rocksdb_cloud_config #else fallback. - tests: remove AWS credential logging; check DeleteFilesInRange/DeleteBucket status; fix DropBucket IsNotFound ordering and wrong status variable; bounded binary append; synchronize shared test state; parse flags / propagate exit codes; bounds-checked record parsing + dynamic_cast guard; proto/iterator API updates. - cmake: guard logservice install with WITH_LOG_SERVICE; resolve braft find_path before use; check protoc result; fix malformed include_directories; legacy host_manager link order + OpenSSL libs. Remove eloq_log_service/.clang-format (identical to root). Also repoint the standalone eloq_log_service build at tx_service/tx-log-protos (its local copy was removed during the earlier inih consolidation). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add data_substrate/.github/workflows/log-service-tests.yml: on PRs touching the log service, build eloq_log_service in both the RocksDB and RocksDB-Cloud(S3) configurations (compile-checking all test targets, incl. the Catch2 cloud tests), then run its RocksDB-Cloud unit tests against a MinIO S3-compatible service — no real AWS credentials or buckets required. To let the cloud tests reach MinIO, add an env-gated S3 endpoint override to the cloud test harness (test/test_utils.h + test/log_server_rocksdb_cloud_tests.cpp): when TEST_S3_ENDPOINT is set, install a rocksdb-cloud s3_client_factory (and point the raw bucket-delete S3 client) at that endpoint with path-style addressing. It is a no-op when unset, so real-AWS / production behavior is unchanged. Verified locally against MinIO: routing works (buckets are created in MinIO) and test_rocksdb_cloud_sst_reader passes. The DBCloud::Open-based cases currently crash inside librocksdb-cloud (CreateLoggerFromOptions -> GetAbsolutePath), independently of this change, and are excluded pending a separate investigation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
eloq_log_service/src/benchmark.cpp (1)
97-116:⚠️ Potential issue | 🟠 Major | ⚡ Quick winBreak out of the inner send loop after RPC failure/redirect.
Using
continuehere keeps sending on the same stale channel/stub, so leader recovery is delayed and failures can repeat for the rest of the million-iteration batch.Suggested fix
if (cntl.Failed()) { LOG(WARNING) << "Fail to send request to " << leader << " : " << cntl.ErrorText(); braft::rtb::update_leader(FLAGS_group, braft::PeerId()); bthread_usleep(FLAGS_timeout_ms * 1000L); - continue; + break; } if (response.response_status() != txlog::LogResponse::ResponseStatus::LogResponse_ResponseStatus_Success) { LOG(WARNING) << "Fail to send request to " << leader << ", redirecting to " << response.write_log_response().redirect(); braft::rtb::update_leader( FLAGS_group, response.write_log_response().redirect()); - continue; + break; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/src/benchmark.cpp` around lines 97 - 116, The code uses `continue` statements when handling RPC failures and redirect responses, which keeps the loop executing with the same stale channel/stub instead of breaking out to allow connection recovery. Replace the `continue` statement in the `if (cntl.Failed())` block and the `continue` statement in the response status check block (where response.response_status() is not Success) with `break` statements instead. This will exit the inner send loop and allow the channel/stub to be re-established on the next attempt, preventing delayed leader recovery and repeated failures.
🧹 Nitpick comments (1)
eloq_log_service/src/test_log_state_rocksdb.cpp (1)
93-111: ⚡ Quick win
test_truncate_logno longer validates truncation behavior.The function still reports “before/after truncate,” but no truncation-triggering operation is executed, so this path cannot catch regressions.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eloq_log_service/src/test_log_state_rocksdb.cpp` around lines 93 - 111, The test function test_truncate_log sets up a truncate_timestamp but does not invoke any truncation operation before verifying results, making the test unable to catch regressions. Replace the `(void) truncate_timestamp;` line with a call to the current truncation API (either UpdateCkptTs or TryCleanMultiStageOps) passing the truncate_timestamp value, since the old DeleteLogItems API has been removed and truncation is now driven by checkpoint timestamps. This ensures the test actually executes a truncation operation and validates that log items are properly truncated before checking the log replay list.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/log-service-tests.yml:
- Around line 20-30: Replace the unpinned `:latest` image tags with specific
immutable digests: update the eloqdata/eloq-dev-ci-ubuntu2404 image and the
bitnami/minio service image to reference specific version digests rather than
`:latest`. Additionally, update the actions/checkout action reference from the
major version pin (e.g., `@v4`) to a fully qualified commit SHA to ensure
immutable dependency pinning. Finally, add the `persist-credentials: false`
parameter to the checkout step to prevent the GitHub token from being left in
`.git/config` after the checkout action completes.
In `@eloq_log_service/include/log_state.h`:
- Around line 1342-1350: Parse failures in snapshot reading are not being
propagated to the stream state, allowing partial snapshots to be accepted if
upstream code only checks is.fail(). At file
eloq_log_service/include/log_state.h lines 1342-1350 (anchor), 1385-1392
(sibling), and 1434-1441 (sibling), when the ParseFromString call fails and the
error is logged, set the stream is to a failed state (using setstate or similar
stream failure mechanism) before returning, ensuring that parse failures are
properly detected by upstream callers checking is.fail().
---
Outside diff comments:
In `@eloq_log_service/src/benchmark.cpp`:
- Around line 97-116: The code uses `continue` statements when handling RPC
failures and redirect responses, which keeps the loop executing with the same
stale channel/stub instead of breaking out to allow connection recovery. Replace
the `continue` statement in the `if (cntl.Failed())` block and the `continue`
statement in the response status check block (where response.response_status()
is not Success) with `break` statements instead. This will exit the inner send
loop and allow the channel/stub to be re-established on the next attempt,
preventing delayed leader recovery and repeated failures.
---
Nitpick comments:
In `@eloq_log_service/src/test_log_state_rocksdb.cpp`:
- Around line 93-111: The test function test_truncate_log sets up a
truncate_timestamp but does not invoke any truncation operation before verifying
results, making the test unable to catch regressions. Replace the `(void)
truncate_timestamp;` line with a call to the current truncation API (either
UpdateCkptTs or TryCleanMultiStageOps) passing the truncate_timestamp value,
since the old DeleteLogItems API has been removed and truncation is now driven
by checkpoint timestamps. This ensures the test actually executes a truncation
operation and validates that log items are properly truncated before checking
the log replay list.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0a8e51fb-f900-4967-b45c-34107b5d2c06
📒 Files selected for processing (31)
.github/workflows/log-service-tests.ymlCMakeLists.txtbuild_tx_service.cmakeeloq_log_service/CMakeLists.txteloq_log_service/include/fault_inject.heloq_log_service/include/glog_error_logging.heloq_log_service/include/log_service.heloq_log_service/include/log_service_metrics.heloq_log_service/include/log_shipping_agent.heloq_log_service/include/log_state.heloq_log_service/include/log_state_rocksdb_impl.heloq_log_service/include/log_utils.heloq_log_service/include/replay_service_for_test.heloq_log_service/include/rocksdb_cloud_config.heloq_log_service/src/benchmark.cppeloq_log_service/src/fault_inject.cppeloq_log_service/src/launch_server.cppeloq_log_service/src/launch_sv.cppeloq_log_service/src/log_instance.cppeloq_log_service/src/log_state_memory_impl.cppeloq_log_service/src/log_state_rocksdb_cloud_impl.cppeloq_log_service/src/log_state_rocksdb_impl.cppeloq_log_service/src/test_log_state_rocksdb.cppeloq_log_service/test/launch_replay_service.cppeloq_log_service/test/log_server_rocksdb_cloud_tests.cppeloq_log_service/test/recover_time_test.cppeloq_log_service/test/rocksdb_cloud_delete_range_test.cppeloq_log_service/test/rocksdb_test.cppeloq_log_service/test/test_utils.hstore_handler/eloq_data_store_service/CMakeLists.txttx_service/raft_host_manager/CMakeLists.txt
✅ Files skipped from review due to trivial changes (1)
- eloq_log_service/include/log_service_metrics.h
🚧 Files skipped from review as they are similar to previous changes (24)
- eloq_log_service/test/launch_replay_service.cpp
- store_handler/eloq_data_store_service/CMakeLists.txt
- eloq_log_service/src/launch_server.cpp
- tx_service/raft_host_manager/CMakeLists.txt
- CMakeLists.txt
- eloq_log_service/test/test_utils.h
- eloq_log_service/src/fault_inject.cpp
- eloq_log_service/include/replay_service_for_test.h
- eloq_log_service/include/log_utils.h
- eloq_log_service/include/rocksdb_cloud_config.h
- build_tx_service.cmake
- eloq_log_service/include/glog_error_logging.h
- eloq_log_service/src/log_state_memory_impl.cpp
- eloq_log_service/CMakeLists.txt
- eloq_log_service/test/log_server_rocksdb_cloud_tests.cpp
- eloq_log_service/include/log_service.h
- eloq_log_service/include/log_shipping_agent.h
- eloq_log_service/src/launch_sv.cpp
- eloq_log_service/test/rocksdb_cloud_delete_range_test.cpp
- eloq_log_service/test/rocksdb_test.cpp
- eloq_log_service/src/log_instance.cpp
- eloq_log_service/src/log_state_rocksdb_impl.cpp
- eloq_log_service/include/log_state_rocksdb_impl.h
- eloq_log_service/src/log_state_rocksdb_cloud_impl.cpp
| image: eloqdata/eloq-dev-ci-ubuntu2404:latest | ||
| # Run as root so actions/checkout can write under /__w/_temp (mirrors | ||
| # unit-tests.yml); otherwise checkout fails with EACCES in this image. | ||
| options: --user root | ||
| services: | ||
| # bitnami/minio starts the server with its default entrypoint (no custom | ||
| # command needed, which GitHub `services:` cannot provide). Reachable from | ||
| # the job container by the service hostname `minio`. | ||
| minio: | ||
| image: bitnami/minio:latest | ||
| env: |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
wf=".github/workflows/log-service-tests.yml"
echo "== Action refs =="
rg -n 'uses:\s*' "$wf"
echo "== Mutable image tags (latest) =="
rg -n 'image:\s+.*:latest' "$wf" || true
echo "== Checkout credential persistence setting =="
rg -n 'persist-credentials' "$wf" || trueRepository: eloqdata/tx_service
Length of output: 292
Pin workflow dependencies immutably and disable credential persistence in checkout.
Unpinned :latest images (lines 20, 29) allow mutation of dependencies without version control, creating supply-chain risk. Action @v4 (line 45) pins to major version but not immutable commit SHA. Missing persist-credentials: false in checkout leaves GitHub token in .git/config for subsequent steps, exposing credentials. Replace :latest with specific digests, update actions/checkout@v4 to a pinned commit SHA, and add persist-credentials: false to the checkout step.
🧰 Tools
🪛 zizmor (1.25.2)
[error] 20-20: unpinned image references (unpinned-images): container image is pinned to latest
(unpinned-images)
[error] 29-29: unpinned image references (unpinned-images): container image is pinned to latest
(unpinned-images)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/log-service-tests.yml around lines 20 - 30, Replace the
unpinned `:latest` image tags with specific immutable digests: update the
eloqdata/eloq-dev-ci-ubuntu2404 image and the bitnami/minio service image to
reference specific version digests rather than `:latest`. Additionally, update
the actions/checkout action reference from the major version pin (e.g., `@v4`) to
a fully qualified commit SHA to ensure immutable dependency pinning. Finally,
add the `persist-credentials: false` parameter to the checkout step to prevent
the GitHub token from being left in `.git/config` after the checkout action
completes.
Source: Linters/SAST tools
- log_state.h: on ParseFromString failures in the snapshot deserialize helper (schema / split-range / cluster-scale op messages), set the input stream to failbit before returning, so callers that check is.fail() detect the parse failure instead of silently accepting partially-loaded state. (read failures already set failbit themselves.) - log-service-tests.yml: the bitnami/minio:latest service image was retired (docker pull => "manifest unknown"), failing the job at container init. GitHub `services:` can't run the official minio image (it needs a `server` command), so run MinIO as a local binary in the job container and point TEST_S3_ENDPOINT at 127.0.0.1:9000. Verified locally: the cloud test rebuilds with the change and test_rocksdb_cloud_sst_reader passes against the binary MinIO (bucket create/drop included). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The prior run failed at Start MinIO although the server started fine: the probe used bash-only /dev/tcp but GitHub runs the step under sh, so it never succeeded. Use a POSIX-sh curl health check (/minio/health/live) and merge MinIO startup + the test into one step so the backgrounded server is alive for the test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…RVICE (#512) Companion to eloqdata/tx_service#509. data_substrate now bundles the log service, host manager, and tx-log-protos in-tree, so the separate clone/symlink build steps are no longer needed. - Bump the data_substrate submodule to the in-tree open-sourcing commit. - Remove the OPEN_LOG_SERVICE cmake option (WITH_LOG_SERVICE is kept) and the stale -DOPEN_LOG_SERVICE / -DFORK_HM_PROCESS flags. - Drop the eloq_log_service / raft_host_manager clone+symlink plumbing from GitHub Actions (build.yml), the Concourse pipelines/tasks/scripts, and scripts/git-checkout.sh / git-tag.sh; the host_manager binary copy steps are kept. - Remove the obsolete log.pr.ent Concourse pipeline (it tested the now-absorbed eloq_log_service repo). - Update CLAUDE.md build instructions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Squash of the 8 chore/data-substrate-third-party-workspace commits, rebased onto current main and reconciled with main's in-tree restructuring (#509): - Build third-party deps from pinned sources: third_party/manifest.yml, scripts/third_party/{fetch,build,install,common}.sh, cmake/EloqThirdParty.cmake. brpc/braft/mimalloc/rocksdb-cloud remain submodules; others fetched from pinned upstream sources. - tx-log-protos, log_service, and host_manager follow main's open-sourced in-tree layout (not re-introduced as submodules). - Drop the top-level abseil-cpp submodule (workspace provides third_party/src/abseil-cpp). - Build the workspace abseil with its default options.h (ABSL_OPTION_USE_*=2 => std:: string_view/any/optional/variant under C++17+), matching how main consumes the abseil submodule via add_subdirectory(tx_service/abseil-cpp). This avoids any source changes for flat_hash_map<std::string> lookups. Build verification pending. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Squash of the 8 chore/data-substrate-third-party-workspace commits, rebased onto current main and reconciled with main's in-tree restructuring (#509): - Build third-party deps from pinned sources: third_party/manifest.yml, scripts/third_party/{fetch,build,install,common}.sh, cmake/EloqThirdParty.cmake. brpc/braft/mimalloc/rocksdb-cloud remain submodules; others fetched from pinned upstream sources. - tx-log-protos, log_service, and host_manager follow main's open-sourced in-tree layout (not re-introduced as submodules). - Drop the top-level abseil-cpp submodule (workspace provides third_party/src/abseil-cpp). - Build the workspace abseil with its default options.h (ABSL_OPTION_USE_*=2 => std:: string_view/any/optional/variant under C++17+), matching how main consumes the abseil submodule via add_subdirectory(tx_service/abseil-cpp). This avoids any source changes for flat_hash_map<std::string> lookups. Build verification pending. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Squash of the 8 chore/data-substrate-third-party-workspace commits, rebased onto current main and reconciled with main's in-tree restructuring (#509): - Build third-party deps from pinned sources: third_party/manifest.yml, scripts/third_party/{fetch,build,install,common}.sh, cmake/EloqThirdParty.cmake. brpc/braft/mimalloc/rocksdb-cloud remain submodules; others fetched from pinned upstream sources. - tx-log-protos, log_service, and host_manager follow main's open-sourced in-tree layout (not re-introduced as submodules). - Drop the top-level abseil-cpp submodule (workspace provides third_party/src/abseil-cpp). - Build the workspace abseil with its default options.h (ABSL_OPTION_USE_*=2 => std:: string_view/any/optional/variant under C++17+), matching how main consumes the abseil submodule via add_subdirectory(tx_service/abseil-cpp). This avoids any source changes for flat_hash_map<std::string> lookups. Build verification pending. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Squash of the 8 chore/data-substrate-third-party-workspace commits, rebased onto current main and reconciled with main's in-tree restructuring (#509): - Build third-party deps from pinned sources: third_party/manifest.yml, scripts/third_party/{fetch,build,install,common}.sh, cmake/EloqThirdParty.cmake. brpc/braft/mimalloc/rocksdb-cloud remain submodules; others fetched from pinned upstream sources. - tx-log-protos, log_service, and host_manager follow main's open-sourced in-tree layout (not re-introduced as submodules). - Drop the top-level abseil-cpp submodule (workspace provides third_party/src/abseil-cpp). - Build the workspace abseil with its default options.h (ABSL_OPTION_USE_*=2 => std:: string_view/any/optional/variant under C++17+), matching how main consumes the abseil submodule via add_subdirectory(tx_service/abseil-cpp). This avoids any source changes for flat_hash_map<std::string> lookups. Build verification pending. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Squash of the 8 chore/data-substrate-third-party-workspace commits, rebased onto current main and reconciled with main's in-tree restructuring (#509): - Build third-party deps from pinned sources: third_party/manifest.yml, scripts/third_party/{fetch,build,install,common}.sh, cmake/EloqThirdParty.cmake. brpc/braft/mimalloc/rocksdb-cloud remain submodules; others fetched from pinned upstream sources. - tx-log-protos, log_service, and host_manager follow main's open-sourced in-tree layout (not re-introduced as submodules). - Drop the top-level abseil-cpp submodule (workspace provides third_party/src/abseil-cpp). - Build the workspace abseil with its default options.h (ABSL_OPTION_USE_*=2 => std:: string_view/any/optional/variant under C++17+), matching how main consumes the abseil submodule via add_subdirectory(tx_service/abseil-cpp). This avoids any source changes for flat_hash_map<std::string> lookups. Build verification pending. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Summary
Open-source the log service, host manager, and tx-log-protos by folding them directly into
data_substrateas in-tree source under the repository's root Business Source License 2.0 (the same model aseloqstore), and remove the build toggles that distinguished the open vs proprietary variants.eloq_log_service/andtx_service/raft_host_manager/(were gitignored nested clones) andtx_service/tx-log-protos/(was a submodule) as tracked in-tree files. Drop tx-log-protos's GPL/AGPLLICENSE(now covered by the root BSL).log_servicesubmodule andbuild_log_service.cmake.OPEN_LOG_SERVICEoption; always build the in-treeeloq_log_serviceviabuild_eloq_log_service.cmake(targetlogservice) underWITH_LOG_SERVICE.FORK_HM_PROCESSoption; the host manager is always built, installed, and forked.OPEN_LOG_SERVICE/FORK_HM_PROCESSpreprocessor guards, keeping the eloq / forked code paths.01,06,07,08,10,README) andCLAUDE.md.WITH_LOG_SERVICEis unchanged. Paired with a companion PR ineloqdata/eloqkvthat bumps this submodule and removes the matching CI clone/symlink plumbing.Test plan
cmakeconfigure +cmake --build(Debug,WITH_DATA_STORE=ELOQDSS_ELOQSTORE,WITH_LOG_STATE=ROCKSDB) succeeds; produceseloqkvanddata_substrate/host_manager.OPEN_LOG_SERVICE,FORK_HM_PROCESS,build_log_service.cmake, removed submodule sections).LogServerconstructors and the real braftLogAgentpaths (not the stubs);#if/#endifbalance confirmed by the build.🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
Build Changes
OPEN_LOG_SERVICEtoWITH_LOG_SERVICEand updated related build wiring.New Features
Documentation
CI/Tests