Skip to content

Add s3 url config#288

Merged
githubzilla merged 11 commits into
mainfrom
add_s3_url_config
Dec 18, 2025
Merged

Add s3 url config#288
githubzilla merged 11 commits into
mainfrom
add_s3_url_config

Conversation

@githubzilla

@githubzilla githubzilla commented Dec 12, 2025

Copy link
Copy Markdown
Collaborator

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Reference the link of issue using fixes eloqdb/tx_service#issue_id
  • Reference the link of RFC if exists
  • Pass ./mtr --suite=mono_main,mono_multi,mono_basic

Summary by CodeRabbit

  • New Features

    • URL-based cloud storage config for RocksDB Cloud (s3://, gs://, http(s)://) that takes precedence over legacy options and allows embedding shard path in the URL.
  • Documentation

    • Added migration guide, protocol examples (S3/GCS/MinIO/HTTPs), clarified precedence and bucket_prefix behavior, and updated deployment/production examples.
  • Bug Fixes / Reliability

    • Restored HTTPS verification by default, enabled AWS SDK SigPipe handling, and added OSS URL validation with fail-fast startup logging.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai

coderabbitai Bot commented Dec 12, 2025

Copy link
Copy Markdown

Walkthrough

Adds URL-based OSS configuration (s3://, gs://, http(s)://) for RocksDB Cloud and txlog, with parsing/validation, new flags, precedence over legacy flags, propagation through configs/factories/services/dump utility, and minor AWS SDK SigPipe initialization changes.

Changes

Cohort / File(s) Summary
Core initialization
core/src/log_init.cpp
Adds txlog_rocksdb_cloud_object_store_service_url flag; parses/validates OSS URL during log init, fatals on invalid URL, and overrides legacy cloud fields (bucket/prefix/object_path/endpoint) when set; logs effective OSS settings.
Global RocksDB config & parsing
store_handler/eloq_data_store_service/rocksdb_config.cpp, store_handler/eloq_data_store_service/rocksdb_config.h
Adds rocksdb_cloud_object_store_service_url flag; introduces oss_url_ and IsOssUrlConfigured(); adds S3UrlComponents, ParseS3Url() and helper to_lower_copy; prefers OSS URL over legacy options when provided.
Factory interface
store_handler/eloq_data_store_service/data_store_factory.h
Adds virtual IsOssUrlConfigured() const with default false so derived factories can report OSS-URL usage.
Data store factory & shard handling
store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h
Adds IsOssUrlConfigured() const override; changes shard-path composition: for OSS URLs append /ds_<shard_id> to the OSS URL (trimming trailing slash), otherwise normalize/append legacy object_path_ + ds_<shard_id>/.
Data store service URL handling
store_handler/eloq_data_store_service/data_store_service.cpp
Only appends ds_<shard_id>/ to downloader URL when OSS URL is not configured; trims trailing slash before appending when needed.
RocksDB Cloud runtime & client
store_handler/eloq_data_store_service/rocksdb_cloud_data_store.cpp, store_handler/eloq_data_store_service/rocksdb_cloud_data_store.h
Compute effective bucket/object_path/endpoint from parsed OSS URL and override source/destination configs; endpoint-aware S3 client factory selection and transfer-manager usage; remove unconditional HTTPS SSL-disable; make cloud_config_ mutable to apply overrides; logs/fatals on invalid OSS URL.
Dump utility
store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp
Adds S3UrlComponents, ParseS3Url() and CLI integration for OSS URL; fills src/dest bucket/prefix/object_path from parsed components when OSS URL provided; updates usage/help and logging.
Docs / flags
store_handler/eloq_data_store_service/README.md, store_handler/eloq_data_store_service/RocksDB_Configuration_Flags.md
Documents new URL-based configuration, migration guide and examples (s3/gs/http(s)), precedence rules, bucket_prefix guidance, and backward compatibility notes; updates examples and flag docs.
AWS SDK init (SigPipe)
core/src/data_substrate.cpp, store_handler/eloq_data_store_service/main.cpp
Enable AWS SDK SigPipe handler via httpOptions.installSigPipeHandler = true during Aws::InitAPI for relevant RocksDB Cloud code paths.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Focus areas:
    • OSS URL parsing/validation and fatal handling (rocksdb_config.{h,cpp}, rocksdb_cloud_dump.cpp, core/src/log_init.cpp).
    • Propagation of effective bucket/object_path/endpoint to source/destination configs (rocksdb_cloud_data_store.cpp).
    • Endpoint-driven S3 client factory selection and transfer-manager enable/disable logic (rocksdb_cloud_data_store.cpp).
    • Consistency of shard-path handling across factory and service (rocksdb_cloud_data_store_factory.h, data_store_service.cpp).
    • Impact of making cloud_config_ mutable (rocksdb_cloud_data_store.h).

Possibly related PRs

  • Restructure for eloqconverge #168 — modifies core/src/log_init.cpp and introduces/adjusts txlog RocksDB Cloud OSS URL/initialization logic touching similar code paths.

Suggested reviewers

  • liunyl
  • lzxddz

Poem

🐇 I sniffed an OSS URL in the burrow light,

s3, gs, http — one path to set things right.
Shards now hop to where the bucket lies,
Old flags still sleep beneath softer skies.
Hooray for tidy routes and slimmer tries! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete—it only contains the template checklist with no substantive content. The actual description lacks explanation of changes, rationale, testing details, and issue references. Fill in the PR description with details about what the changes accomplish, why they're needed, which tests were added/run, and include references to related issues or RFCs.
Docstring Coverage ⚠️ Warning Docstring coverage is 20.83% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Add s3 url config' is vague and does not clearly describe the main change. While related to the changeset, it lacks specificity about what functionality is being added. Use a more descriptive title like 'Add S3 URL-based configuration for RocksDB Cloud' or 'Support S3/GCS URL config for object store service' to better convey the scope and nature of the changes.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch add_s3_url_config

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9435310 and f832778.

📒 Files selected for processing (2)
  • store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (6 hunks)
  • store_handler/eloq_data_store_service/rocksdb_config.h (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/tx_service PR: 288
File: store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h:54-76
Timestamp: 2025-12-16T09:47:03.594Z
Learning: In RocksDBCloudDataStoreFactory::CreateDataStore (store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h), when oss_url is configured, the factory always appends /ds_<shard_id> to the oss_url. Users should not include the shard path in the oss_url themselves.
📚 Learning: 2025-12-16T09:47:03.594Z
Learnt from: githubzilla
Repo: eloqdata/tx_service PR: 288
File: store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h:54-76
Timestamp: 2025-12-16T09:47:03.594Z
Learning: In RocksDBCloudDataStoreFactory::CreateDataStore (store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h), when oss_url is configured, the factory always appends /ds_<shard_id> to the oss_url. Users should not include the shard path in the oss_url themselves.

Applied to files:

  • store_handler/eloq_data_store_service/rocksdb_config.h
  • store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp
🔇 Additional comments (5)
store_handler/eloq_data_store_service/rocksdb_config.h (2)

84-91: LGTM!

The to_lower_copy helper correctly uses pass-by-value with in-place transformation, leveraging move semantics on return. Using unsigned char for the lambda parameter properly handles extended ASCII.


217-222: LGTM!

Clean predicate method for checking OSS URL configuration state.

store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (3)

255-290: LGTM!

The OSS URL parsing logic correctly takes precedence over legacy configuration. Error handling provides clear messages, and the logging appropriately omits the raw URL while showing the parsed components.


529-531: Good addition for reliability.

Installing the SIGPIPE handler prevents process termination from broken pipe signals during S3 operations—important for robustness in network-facing tools.


556-563: LGTM!

Explicit separation of SetBucketName and SetBucketPrefix calls is clearer than combined configuration, and correctly handles both OSS URL (empty prefix) and legacy (prefix from flag) cases.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
store_handler/eloq_data_store_service/README.md (1)

134-171: Docs: clarify shard suffix behavior for oss_url
With URL-based config, the factory currently appends /ds_<shard_id> when rocksdb_cloud_oss_url is set (see RocksDBCloudDataStoreFactory::CreateDataStore). Please document whether users should provide a base path only (recommended) and that DSS will suffix per-shard, or document how to include shard in URL without double-suffixing.

🧹 Nitpick comments (6)
store_handler/eloq_data_store_service/data_store_service.cpp (1)

3049-3054: Consider guarding against shard-path collisions under OSS URL mode.
With IsOssUrlConfigured()==true, multiple shards will share the same base URL unless the user includes ds_{shard_id}/ (or equivalent) in rocksdb_cloud_oss_url / derived object path. That’s easy to misconfigure and can cause cross-shard file-cache sync to read/write the wrong object prefix.

  • Add a LOG(WARNING) when IsOssUrlConfigured()==true and the resolved object path doesn’t appear shard-scoped (even a heuristic), or explicitly document the requirement (see doc comment below).
store_handler/eloq_data_store_service/RocksDB_Configuration_Flags.md (1)

90-118: Doc gap: call out shard subpath responsibility when using rocksdb_cloud_oss_url.
Given the code now does not auto-append ds_{shard_id}/ when OSS URL is configured, the docs should explicitly state whether users must include shard-specific subpaths in rocksdb_cloud_oss_url (and what the expected layout is) to avoid shard collisions.

store_handler/eloq_data_store_service/rocksdb_config.h (1)

199-207: oss_url_ plumbing is straightforward; watch for code duplication across TU’s
oss_url_ + IsOssUrlConfigured() is a clean addition. Main follow-up: prefer reusing this header’s ParseS3Url everywhere (e.g., dump utility) to avoid divergent parsing rules.

store_handler/eloq_data_store_service/README.md (1)

207-309: Migration guide is useful; add one explicit “per-shard layout” note
Consider adding a short note showing the resulting object layout, e.g. base .../rocksdb_cloud becoming .../rocksdb_cloud/ds_0, .../rocksdb_cloud/ds_1, etc (or whichever behavior you choose), to prevent misconfiguration during migration.

store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (2)

100-205: Avoid duplicated ParseS3Url(): reuse store_handler/eloq_data_store_service/rocksdb_config.h
This file redefines S3UrlComponents + ParseS3Url (Line 100-205), which now duplicates the shared implementation in rocksdb_config.h. This is likely to drift (and already differs from other S3 client handling in the repo).

Suggestion: include and use EloqDS::ParseS3Url / EloqDS::S3UrlComponents instead of local copies, and delete the local definitions.


233-263: oss_url precedence parsing looks fine; reconsider credential requirements for parity with server
The precedence logic (Line 233-263) is clear. One concern: later validation still requires explicit --aws_access_key_id/--aws_secret_key even though the server path supports default credential providers. If intentional, please document it; otherwise, consider allowing empty keys to use the default provider for this tool as well.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3164008 and abc7dc3.

📒 Files selected for processing (10)
  • core/src/log_init.cpp (3 hunks)
  • store_handler/eloq_data_store_service/README.md (3 hunks)
  • store_handler/eloq_data_store_service/RocksDB_Configuration_Flags.md (1 hunks)
  • store_handler/eloq_data_store_service/data_store_factory.h (1 hunks)
  • store_handler/eloq_data_store_service/data_store_service.cpp (1 hunks)
  • store_handler/eloq_data_store_service/rocksdb_cloud_data_store.cpp (3 hunks)
  • store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h (2 hunks)
  • store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (6 hunks)
  • store_handler/eloq_data_store_service/rocksdb_config.cpp (2 hunks)
  • store_handler/eloq_data_store_service/rocksdb_config.h (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
store_handler/eloq_data_store_service/data_store_factory.h (1)
tx_service/include/store/data_store_handler.h (12)
  • bool (98-101)
  • bool (103-106)
  • bool (214-223)
  • bool (225-229)
  • bool (235-240)
  • bool (366-371)
  • bool (381-388)
  • bool (390-394)
  • bool (396-400)
  • bool (410-413)
  • bool (415-418)
  • bool (435-438)
store_handler/eloq_data_store_service/rocksdb_config.h (1)
store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (2)
  • ParseS3Url (120-205)
  • ParseS3Url (120-120)
core/src/log_init.cpp (1)
core/include/data_substrate.h (1)
  • CheckCommandLineFlagIsDefault (311-322)
store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (1)
store_handler/eloq_data_store_service/rocksdb_config.h (1)
  • ParseS3Url (94-179)
🔇 Additional comments (8)
store_handler/eloq_data_store_service/rocksdb_config.cpp (2)

297-305: Good add: rocksdb_cloud_oss_url flag + clear help text.
Looks consistent with the precedence model described in docs/other wiring.


520-526: Good: plumb OSS URL through config with cmdline-over-file precedence.
This matches how other flags are read in this file.

core/src/log_init.cpp (1)

83-88: Flag addition looks fine; keep naming consistent with store-side (*_oss_url).
No concerns.

store_handler/eloq_data_store_service/data_store_factory.h (1)

101-106: Virtual method is properly overridden in the relevant factory implementation.
RocksDBCloudDataStoreFactory correctly overrides IsOssUrlConfigured() and returns the config state. Other factories (RocksDBDataStoreFactory, EloqStoreDataStoreFactory) appropriately inherit the default false since they don't support OSS URLs. The vtable change is a standard consequence of adding a virtual method and is safe for a monolithic internal codebase.

store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h (1)

144-147: IsOssUrlConfigured() override looks good
Simple delegation keeps factory behavior consistent with config ownership.

store_handler/eloq_data_store_service/rocksdb_cloud_data_store.cpp (2)

238-240: Good: don’t silently disable SSL verification
Keeping SSL verification enabled by default is the right security posture; if you need test-only behavior, gate it behind an explicit flag/env.


377-392: Transfer-manager toggle based on effective endpoint is correct and properly propagated

The code logic is sound: when effective_endpoint_url is empty (S3 standard endpoint), AWS Transfer Manager is enabled for optimized multipart operations; when set (MinIO or custom endpoint), it is disabled due to known IOError issues. The endpoint configuration properly flows through BuildS3ClientFactorycfs_options_.s3_client_factory → storage provider, and all cloud operations including manifest listing respect the configured endpoint. The behavior correctly handles both s3:// URLs (empty endpoint) and http(s)://host/... URLs (custom endpoint with override).

store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (1)

525-532: Good: bucket prefix is applied explicitly to src/dest buckets
Setting SetBucketName and SetBucketPrefix separately for both buckets improves clarity and matches the “prefix cleared on oss_url” behavior.

Comment thread core/src/log_init.cpp
Comment thread store_handler/eloq_data_store_service/data_store_service.cpp
Comment thread store_handler/eloq_data_store_service/rocksdb_cloud_data_store.cpp
Comment thread store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp Outdated
Comment thread store_handler/eloq_data_store_service/rocksdb_config.h

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
core/src/log_init.cpp (1)

438-446: Don't log the raw OSS URL (can leak credentials for http(s) URLs).

Line 440 logs txlog_rocksdb_cloud_config.oss_url_, which may contain sensitive userinfo or query parameters. Since you already log the parsed, non-sensitive fields (bucket_name, object_path, endpoint_url), the raw URL should be removed from the log message.

Apply this diff:

-                LOG(INFO) << "Using OSS URL configuration for txlog (overrides legacy config if "
-                             "present): "
-                          << txlog_rocksdb_cloud_config.oss_url_
-                          << " (bucket: " << txlog_rocksdb_cloud_config.bucket_name_
+                LOG(INFO) << "Using OSS URL configuration for txlog (overrides legacy config if "
+                             "present)"
+                          << " (bucket: " << txlog_rocksdb_cloud_config.bucket_name_
                           << ", object_path: " << txlog_rocksdb_cloud_config.object_path_
                           << ", endpoint: "
                           << (txlog_rocksdb_cloud_config.endpoint_url_.empty() ? "default"
                                                                                 : txlog_rocksdb_cloud_config.endpoint_url_)
                           << ")";
store_handler/eloq_data_store_service/rocksdb_cloud_data_store.cpp (1)

296-343: Effective OSS URL config is not used consistently in later operations.

You compute effective_bucket_name, effective_bucket_prefix, effective_object_path, and effective_endpoint_url (lines 296-323) and apply them to cfs_options_ (lines 335-342). However, later code still references the original cloud_config_ fields:

  • Line 431-433: FindMaxTermFromCloudManifestFiles uses cloud_config_.bucket_prefix_, cloud_config_.bucket_name_, cloud_config_.object_path_
  • Lines 647-651: PurgerEventListener construction uses cloud_config_.bucket_prefix_ and cloud_config_.object_path_

When OSS URL is configured, these operations will search/operate in the wrong location (using legacy config instead of effective values).

Persist the effective values to class members (e.g., add effective_bucket_name_, effective_bucket_prefix_, effective_object_path_, effective_endpoint_url_ to the class) when parsing OSS URL, then update all later code to use these members instead of cloud_config_.*.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3486fd8 and 9a103e2.

📒 Files selected for processing (6)
  • core/src/log_init.cpp (3 hunks)
  • store_handler/eloq_data_store_service/README.md (3 hunks)
  • store_handler/eloq_data_store_service/RocksDB_Configuration_Flags.md (1 hunks)
  • store_handler/eloq_data_store_service/rocksdb_cloud_data_store.cpp (3 hunks)
  • store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (6 hunks)
  • store_handler/eloq_data_store_service/rocksdb_config.cpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • store_handler/eloq_data_store_service/RocksDB_Configuration_Flags.md
  • store_handler/eloq_data_store_service/rocksdb_config.cpp
🧰 Additional context used
🧬 Code graph analysis (2)
store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (1)
store_handler/eloq_data_store_service/rocksdb_config.h (1)
  • ParseS3Url (94-179)
core/src/log_init.cpp (2)
store_handler/eloq_data_store_service/rocksdb_config.cpp (2)
  • CheckCommandLineFlagIsDefault (308-320)
  • CheckCommandLineFlagIsDefault (308-308)
core/include/data_substrate.h (1)
  • CheckCommandLineFlagIsDefault (311-322)
🔇 Additional comments (4)
store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (1)

243-275: OSS URL parsing and precedence logic is correct.

The implementation correctly:

  • Parses the OSS URL when provided
  • Validates the URL format with descriptive error messages
  • Overrides legacy configuration (bucket_name, bucket_prefix, object_path, s3_endpoint_url)
  • Clears bucket_prefix as expected for URL-based config
  • Logs the effective configuration
core/src/log_init.cpp (1)

83-88: LGTM!

The new txlog_rocksdb_cloud_object_store_service_url flag is well-defined with clear documentation of supported formats and precedence semantics.

store_handler/eloq_data_store_service/README.md (1)

134-309: Excellent documentation of the OSS URL migration.

The migration guide is comprehensive and well-structured:

  • Clear rationale for the new URL-based configuration
  • Multiple practical examples covering AWS S3, MinIO, HTTPS endpoints, GCS, and txlog
  • Important notes on precedence, bucket_prefix deprecation, backward compatibility, and protocol support
  • Helpful validation guidance

This will significantly aid users in migrating to the new configuration format.

store_handler/eloq_data_store_service/rocksdb_cloud_data_store.cpp (1)

238-239: Good: SSL verification no longer unconditionally disabled.

The comment documents that SSL verification can be disabled for test environments, but the code no longer does this by default. This is a security improvement over previous versions.

Comment thread store_handler/eloq_data_store_service/rocksdb_cloud_data_store.cpp
Comment thread store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp
Comment thread store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
store_handler/eloq_data_store_service/rocksdb_config.h (1)

137-152: Missing validation for empty endpoint host in http/https URLs.

This issue was flagged in a previous review. URLs like https:///bucket/path would pass validation but produce an empty host. The first_slash == 0 case should be rejected.

core/src/log_init.cpp (1)

446-458: Avoid logging raw OSS URL - potential credential exposure.

This was flagged in a previous review. The raw oss_url_ may contain userinfo or query params for http(s) URLs. Since you already log bucket, object_path, and endpoint separately, remove the raw URL from the log message.

🧹 Nitpick comments (1)
store_handler/eloq_data_store_service/rocksdb_config.h (1)

74-82: Code duplication with rocksdb_cloud_dump.cpp.

This struct and the ParseS3Url function below are nearly identical to those in store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (lines 126-214). Consider extracting these into a shared header to avoid maintaining two copies.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a103e2 and 97c78f8.

📒 Files selected for processing (6)
  • core/src/log_init.cpp (3 hunks)
  • store_handler/eloq_data_store_service/data_store_factory.h (1 hunks)
  • store_handler/eloq_data_store_service/data_store_service.cpp (1 hunks)
  • store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h (2 hunks)
  • store_handler/eloq_data_store_service/rocksdb_config.cpp (2 hunks)
  • store_handler/eloq_data_store_service/rocksdb_config.h (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • store_handler/eloq_data_store_service/rocksdb_config.cpp
  • store_handler/eloq_data_store_service/data_store_service.cpp
  • store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h
🧰 Additional context used
🧬 Code graph analysis (3)
core/src/log_init.cpp (2)
core/include/data_substrate.h (1)
  • CheckCommandLineFlagIsDefault (311-322)
store_handler/eloq_data_store_service/rocksdb_config.h (1)
  • ParseS3Url (94-182)
store_handler/eloq_data_store_service/rocksdb_config.h (1)
store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (2)
  • ParseS3Url (127-215)
  • ParseS3Url (127-127)
store_handler/eloq_data_store_service/data_store_factory.h (1)
tx_service/include/store/data_store_handler.h (12)
  • bool (98-101)
  • bool (103-106)
  • bool (214-223)
  • bool (225-229)
  • bool (235-240)
  • bool (366-371)
  • bool (381-388)
  • bool (390-394)
  • bool (396-400)
  • bool (410-413)
  • bool (415-418)
  • bool (435-438)
🔇 Additional comments (4)
store_handler/eloq_data_store_service/data_store_factory.h (1)

101-110: LGTM!

The new virtual method follows the established pattern of the base class, providing a sensible default (false) that derived factories can override when OSS URL configuration is in use.

store_handler/eloq_data_store_service/rocksdb_config.h (1)

202-213: LGTM!

The oss_url_ member and IsOssUrlConfigured() method are cleanly implemented, providing a clear way to check if OSS URL configuration is active.

core/src/log_init.cpp (2)

83-89: LGTM!

The flag definition is well-documented with clear format examples and precedence semantics.


302-309: LGTM!

The configuration loading follows the established precedence pattern (command-line flag over config file).

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (3)

82-109: CLI help/usage still reads "legacy-first"; update to make object_store_service_url the primary path.

The usage header (lines 86-87) prominently displays --bucket_name=BUCKET --bucket_prefix=PREFIX, while --object_store_service_url appears later in the options list. Since the OSS URL is now the preferred configuration method, consider restructuring the usage message to feature it prominently in the header and grouping legacy flags in a separate "Legacy Options" section.


342-407: Security: HTTPS connections will not verify SSL certificates.

Line 390 unconditionally disables SSL verification (config.verifySSL = false;) even when the endpoint uses HTTPS. This creates a man-in-the-middle vulnerability for HTTPS connections, which is especially problematic now that --object_store_service_url explicitly supports https:// URLs.

While the code correctly detects HTTPS endpoints (lines 368-370), it then disables verification regardless. If insecure mode is needed for local testing, gate it behind an explicit command-line flag (e.g., --insecure_skip_verify_ssl) rather than unconditionally disabling verification.

Add a new flag and conditional logic:

+DEFINE_bool(insecure_skip_verify_ssl, false, "Skip SSL certificate verification for HTTPS (INSECURE - use only for testing)");
+
 // Define command line flags
 DEFINE_string(aws_access_key_id, "", "AWS access key ID");

Then in the client factory:

         {
             config.scheme = Aws::Http::Scheme::HTTPS;
+            config.verifySSL = !FLAGS_insecure_skip_verify_ssl;
         }
         else
         {
             config.scheme = Aws::Http::Scheme::HTTP;
+            config.verifySSL = false;  // No SSL for HTTP
         }
-        // Disable SSL verification for HTTPS
-        config.verifySSL = false;

111-230: Eliminate code duplication: reuse ParseS3Url from rocksdb_config.h.

The ParseS3Url function and S3UrlComponents struct are duplicated verbatim from store_handler/eloq_data_store_service/rocksdb_config.h. This creates maintenance burden—bug fixes or enhancements must be applied in multiple places.

Remove the duplicate definition and include rocksdb_config.h instead to reference EloqDS::ParseS3Url.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e3a29b0 and 9435310.

📒 Files selected for processing (1)
  • store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (6 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/tx_service PR: 288
File: store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h:54-76
Timestamp: 2025-12-16T09:47:03.594Z
Learning: In RocksDBCloudDataStoreFactory::CreateDataStore (store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h), when oss_url is configured, the factory always appends /ds_<shard_id> to the oss_url. Users should not include the shard path in the oss_url themselves.
📚 Learning: 2025-12-16T09:47:03.594Z
Learnt from: githubzilla
Repo: eloqdata/tx_service PR: 288
File: store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h:54-76
Timestamp: 2025-12-16T09:47:03.594Z
Learning: In RocksDBCloudDataStoreFactory::CreateDataStore (store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h), when oss_url is configured, the factory always appends /ds_<shard_id> to the oss_url. Users should not include the shard path in the oss_url themselves.

Applied to files:

  • store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp
🔇 Additional comments (4)
store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (4)

47-67: Good flag structure with clear legacy markers.

The flag definitions appropriately mark legacy options and clearly indicate that object_store_service_url takes precedence. This helps users migrate to the new configuration approach.


257-292: Well-implemented OSS URL configuration with proper precedence and security.

The implementation correctly:

  • Gives object_store_service_url precedence over legacy flags
  • Validates the parsed URL before using it
  • Sets bucket_prefix to empty (as required when using OSS URL per learnings)
  • Logs only parsed components (lines 278-283) without exposing the raw URL, which addresses the previous security concern about credential leakage in http(s) URLs

529-534: Appropriate AWS SDK configuration for a utility tool.

The AWS SDK initialization correctly enables the SigPipe handler (line 531) to prevent crashes from broken connections, and sets Info-level logging (lines 532-533) which is suitable for a diagnostic utility.


557-565: Symmetric bucket configuration for source and destination.

The code correctly configures both source and destination buckets with the same parameters, which is appropriate for this dump utility. When object_store_service_url is used, bucket_prefix is empty (as set in line 274), aligning with the expected behavior per learnings.

Comment thread store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp Outdated
@githubzilla githubzilla merged commit afe9672 into main Dec 18, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants