fix crash in malloc_usable_size and rocksdb_gcs compile#328
Conversation
WalkthroughAdds ELOQDSS_ROCKSDB_CLOUD_GCS build paths and linking in CMake, confines AWS credential flag declarations to the CLOUD_S3 RocksDB configuration, and guards malloc-usable-size calculations in local_cc_shards.cpp to avoid dereferencing empty vectors. (47 words) Changes
Sequence Diagram(s)(omitted — changes are build/conditional and defensive memory guards without a new multi-component control flow) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used🧠 Learnings (5)📓 Common learnings📚 Learning: 2025-12-16T09:47:09.981ZApplied to files:
📚 Learning: 2025-10-09T03:56:58.811ZApplied to files:
📚 Learning: 2025-11-11T07:10:40.346ZApplied to files:
📚 Learning: 2025-12-02T10:43:27.431ZApplied to files:
🔇 Additional comments (5)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
CMakeLists.txt (1)
363-371: GCS variant lacks proper cloud RocksDB library configurationThe GCS variant is linked against
ROCKSDB_LIBRARIES, but this variable is only populated with S3-specific cloud libraries (rocksdb-cloud-aws). There is no corresponding GCS-specific cloud RocksDB library added toROCKSDB_LIBRARIES. The "Common RocksDB Cloud setup" section (lines 197-202) only configures include paths, not library dependencies. Add afind_librarycall for the GCS variant with the appropriate RocksDB-cloud GCS library, or ensureROCKSDB_LIBRARIESis properly populated for GCS builds.
🧹 Nitpick comments (3)
tx_service/src/cc/local_cc_shards.cpp (2)
4107-4123: Good fix: guard mi_malloc_usable_size calls with empty() checksThe added references and empty() checks around
mi_malloc_usable_sizeavoid passing null/invalid pointers when the vectors are empty, which directly addresses the reported SEGV without changing behavior for non‑empty batches. The slight under‑accounting for reserved‑but‑empty vectors is acceptable for this quota heuristic.
5030-5047: Consistent crash‑proofing for hash‑partition data sync quotaMirroring the
empty()guard pattern for the hash‑partition path keeps the quota logic consistent with the range‑partition path and eliminates the same potential crash when those vectors are empty.CMakeLists.txt (1)
293-327: ELOQDSS_ROCKSDB_CLOUD_GCS source selection looks correct; verify downloader coverageIncluding
rocksdb_cloud_data_store.cpp,purger_event_listener.cpp, andpurger_sliding_window.cppforELOQDSS_ROCKSDB_CLOUD_GCSmatches the S3 cloud path and should unblock GCS builds. Since the S3 path also pulls ins3_file_downloader.cpp, double‑check that the GCS backend either doesn’t need an equivalent downloader or that any GCS‑specific downloader source is wired in elsewhere.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
CMakeLists.txtcore/src/storage_init.cppstore_handler/eloq_data_store_service/rocksdb_config.cpptx_service/src/cc/local_cc_shards.cpp
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/tx_service PR: 288
File: store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h:54-76
Timestamp: 2025-12-16T09:47:09.981Z
Learning: In RocksDBCloudDataStoreFactory::CreateDataStore (store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h), when oss_url is configured, the factory always appends /ds_<shard_id> to the oss_url. Users should not include the shard path in the oss_url themselves.
📚 Learning: 2025-12-16T09:47:09.981Z
Learnt from: githubzilla
Repo: eloqdata/tx_service PR: 288
File: store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h:54-76
Timestamp: 2025-12-16T09:47:09.981Z
Learning: In RocksDBCloudDataStoreFactory::CreateDataStore (store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h), when oss_url is configured, the factory always appends /ds_<shard_id> to the oss_url. Users should not include the shard path in the oss_url themselves.
Applied to files:
CMakeLists.txt
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.
Applied to files:
tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.
Applied to files:
tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-12-02T10:43:27.431Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 254
File: tx_service/src/cc/local_cc_shards.cpp:2949-3188
Timestamp: 2025-12-02T10:43:27.431Z
Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.
Applied to files:
tx_service/src/cc/local_cc_shards.cpp
🔇 Additional comments (2)
store_handler/eloq_data_store_service/rocksdb_config.cpp (1)
280-284: LGTM! Correctly isolates AWS credentials to S3 backend.The change properly restricts AWS credential definitions to the S3 backend only, removing the dependency on AWS SDK when compiling with GCS backend. This directly addresses the "rocksdb_gcs compile" issue mentioned in the PR title.
core/src/storage_init.cpp (1)
72-78: LGTM! AWS credentials correctly scoped to S3 and DynamoDB backends.The change appropriately removes the GCS backend from the AWS credential conditional, ensuring GCS compilation doesn't require AWS SDK dependencies. This aligns with the parallel changes in
rocksdb_config.cpp.Minor: Line 75 has trailing whitespace before the closing parenthesis.
3b4e9de to
2e52d5d
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
core/src/storage_init.cpp (1)
75-75: Minor: Trailing whitespace.Line 75 has trailing whitespace after the condition. Consider removing it for consistency.
🔎 Proposed fix
-#elif defined(DATA_STORE_TYPE_DYNAMODB) || defined(LOG_STATE_TYPE_RKDB_S3) +#elif defined(DATA_STORE_TYPE_DYNAMODB) || defined(LOG_STATE_TYPE_RKDB_S3)tx_service/src/cc/local_cc_shards.cpp (3)
24-28:<malloc.h>include is fine but may be non‑portableIncluding
<malloc.h>is needed formalloc_usable_size, but it is GNU/glibc‑specific. If this code must build on non‑Linux or non‑glibc platforms (e.g. macOS, BSD), you may need a small portability shim (e.g.#ifdef __GLIBC__with an alternative such as<malloc/malloc.h>or a fallback) rather than unconditionally including<malloc.h>.Please confirm whether tx_service is guaranteed to build only on glibc/Linux, or if you need a guarded include for other targets.
4098-4142: ASan/mimalloc guarded*_usable_sizeusage looks correct; consider de‑duplicating and double‑check allocator assumptionsThe new logic:
- Skips
malloc_usable_size/mi_malloc_usable_sizecalls when the vectors are empty.- Uses
malloc_usable_sizeunder__SANITIZE_ADDRESS__andmi_malloc_usable_sizeotherwise, matching the allocator expected in ASan vs mimalloc builds.- Avoids the crash seen when calling
mi_malloc_usable_sizeon buffers not owned by mimalloc.This looks reasonable and should eliminate the observed SEGV in non‑jemalloc range‑partition paths.
Two follow‑ups to consider:
Allocator/build assumption: This relies on ASan builds not also using mimalloc as the allocator for these vectors. If you ever combine ASan with mimalloc overrides, the
malloc_usable_sizebranch could again be incompatible with the actual allocator.Duplication: The ASan/mimalloc +
.empty()pattern is duplicated here and in the hash‑partition path. A tiny helper would keep behavior consistent and easier to audit, for example:template <class Vec> static size_t UsableStorageSize(const Vec& v) { if (v.empty()) return 0; #ifdef __SANITIZE_ADDRESS__ return malloc_usable_size(v.data()); #else return mi_malloc_usable_size(v.data()); #endif }and then sum
UsableStorageSize(...)for each vector.Please confirm that ASan builds in this repo do not also route allocations through mimalloc; if they do, we should revisit which
*_usable_sizeAPI is correct under that configuration and potentially centralize this in a helper as above.
5042-5084: Hash‑partitionflush_data_sizeaccounting changes mirror the range path; keep them in sync via a helperThe hash‑partition branch now mirrors the new range‑partition behavior: under non‑jemalloc builds it:
- Guards calls on
.empty()vectors.- Uses
malloc_usable_sizeunder__SANITIZE_ADDRESS__andmi_malloc_usable_sizeotherwise.That aligns the two code paths and should prevent the same crash in the hash‑partition case.
Given the logic is now duplicated almost verbatim between range and hash paths, it would be safer long‑term to factor out a shared helper (as suggested in the previous comment) so future fixes apply uniformly.
If you introduce a helper for the range path, please apply it here too to avoid divergence between partition types; also verify in your CI matrix that both ASan and non‑ASan, WITH_JEMALLOC and non‑JEMALLOC builds all see the intended branch.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
CMakeLists.txtcore/src/storage_init.cppstore_handler/eloq_data_store_service/rocksdb_config.cpptx_service/src/cc/local_cc_shards.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
- store_handler/eloq_data_store_service/rocksdb_config.cpp
- CMakeLists.txt
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/tx_service PR: 288
File: store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h:54-76
Timestamp: 2025-12-16T09:47:09.981Z
Learning: In RocksDBCloudDataStoreFactory::CreateDataStore (store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h), when oss_url is configured, the factory always appends /ds_<shard_id> to the oss_url. Users should not include the shard path in the oss_url themselves.
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.
Applied to files:
tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.
Applied to files:
tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-12-02T10:43:27.431Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 254
File: tx_service/src/cc/local_cc_shards.cpp:2949-3188
Timestamp: 2025-12-02T10:43:27.431Z
Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.
Applied to files:
tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-10-20T04:30:07.884Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.
Applied to files:
tx_service/src/cc/local_cc_shards.cpp
🔇 Additional comments (1)
core/src/storage_init.cpp (1)
72-78: Remove trailing whitespace on line 75.The preprocessor guard logic in lines 72-78 is correct. When
DATA_STORE_TYPE_ELOQDSS_ROCKSDB_CLOUD_S3is defined, the flags are defined inrocksdb_config.cpp(lines 281-283) and declared here; when onlyLOG_STATE_TYPE_RKDB_S3orDATA_STORE_TYPE_DYNAMODBare defined, the flags are defined in this file's elif branch. No issue with flag availability across configurations.The only concern is the trailing space at the end of line 75 (
#elif defined(...)).Likely an incorrect or invalid review comment.
2e52d5d to
939be54
Compare
Here are some reminders before you submit the pull request
fixes eloqdb/tx_service#issue_id./mtr --suite=mono_main,mono_multi,mono_basicSummary by CodeRabbit
New Features
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.