Add s3 url config#288
Conversation
WalkthroughAdds URL-based OSS configuration (s3://, gs://, http(s)://) for RocksDB Cloud and txlog, with parsing/validation, new flags, precedence over legacy flags, propagation through configs/factories/services/dump utility, and minor AWS SDK SigPipe initialization changes. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings, 1 inconclusive)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used🧠 Learnings (2)📓 Common learnings📚 Learning: 2025-12-16T09:47:03.594ZApplied to files:
🔇 Additional comments (5)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
store_handler/eloq_data_store_service/README.md (1)
134-171: Docs: clarify shard suffix behavior for oss_url
With URL-based config, the factory currently appends/ds_<shard_id>whenrocksdb_cloud_oss_urlis set (seeRocksDBCloudDataStoreFactory::CreateDataStore). Please document whether users should provide a base path only (recommended) and that DSS will suffix per-shard, or document how to include shard in URL without double-suffixing.
🧹 Nitpick comments (6)
store_handler/eloq_data_store_service/data_store_service.cpp (1)
3049-3054: Consider guarding against shard-path collisions under OSS URL mode.
WithIsOssUrlConfigured()==true, multiple shards will share the same base URL unless the user includesds_{shard_id}/(or equivalent) inrocksdb_cloud_oss_url/ derived object path. That’s easy to misconfigure and can cause cross-shard file-cache sync to read/write the wrong object prefix.
- Add a
LOG(WARNING)whenIsOssUrlConfigured()==trueand the resolved object path doesn’t appear shard-scoped (even a heuristic), or explicitly document the requirement (see doc comment below).store_handler/eloq_data_store_service/RocksDB_Configuration_Flags.md (1)
90-118: Doc gap: call out shard subpath responsibility when usingrocksdb_cloud_oss_url.
Given the code now does not auto-appendds_{shard_id}/when OSS URL is configured, the docs should explicitly state whether users must include shard-specific subpaths inrocksdb_cloud_oss_url(and what the expected layout is) to avoid shard collisions.store_handler/eloq_data_store_service/rocksdb_config.h (1)
199-207: oss_url_ plumbing is straightforward; watch for code duplication across TU’s
oss_url_+IsOssUrlConfigured()is a clean addition. Main follow-up: prefer reusing this header’sParseS3Urleverywhere (e.g., dump utility) to avoid divergent parsing rules.store_handler/eloq_data_store_service/README.md (1)
207-309: Migration guide is useful; add one explicit “per-shard layout” note
Consider adding a short note showing the resulting object layout, e.g. base.../rocksdb_cloudbecoming.../rocksdb_cloud/ds_0,.../rocksdb_cloud/ds_1, etc (or whichever behavior you choose), to prevent misconfiguration during migration.store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (2)
100-205: Avoid duplicated ParseS3Url(): reuse store_handler/eloq_data_store_service/rocksdb_config.h
This file redefinesS3UrlComponents+ParseS3Url(Line 100-205), which now duplicates the shared implementation inrocksdb_config.h. This is likely to drift (and already differs from other S3 client handling in the repo).Suggestion: include and use
EloqDS::ParseS3Url/EloqDS::S3UrlComponentsinstead of local copies, and delete the local definitions.
233-263: oss_url precedence parsing looks fine; reconsider credential requirements for parity with server
The precedence logic (Line 233-263) is clear. One concern: later validation still requires explicit--aws_access_key_id/--aws_secret_keyeven though the server path supports default credential providers. If intentional, please document it; otherwise, consider allowing empty keys to use the default provider for this tool as well.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
core/src/log_init.cpp(3 hunks)store_handler/eloq_data_store_service/README.md(3 hunks)store_handler/eloq_data_store_service/RocksDB_Configuration_Flags.md(1 hunks)store_handler/eloq_data_store_service/data_store_factory.h(1 hunks)store_handler/eloq_data_store_service/data_store_service.cpp(1 hunks)store_handler/eloq_data_store_service/rocksdb_cloud_data_store.cpp(3 hunks)store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h(2 hunks)store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp(6 hunks)store_handler/eloq_data_store_service/rocksdb_config.cpp(2 hunks)store_handler/eloq_data_store_service/rocksdb_config.h(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
store_handler/eloq_data_store_service/data_store_factory.h (1)
tx_service/include/store/data_store_handler.h (12)
bool(98-101)bool(103-106)bool(214-223)bool(225-229)bool(235-240)bool(366-371)bool(381-388)bool(390-394)bool(396-400)bool(410-413)bool(415-418)bool(435-438)
store_handler/eloq_data_store_service/rocksdb_config.h (1)
store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (2)
ParseS3Url(120-205)ParseS3Url(120-120)
core/src/log_init.cpp (1)
core/include/data_substrate.h (1)
CheckCommandLineFlagIsDefault(311-322)
store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (1)
store_handler/eloq_data_store_service/rocksdb_config.h (1)
ParseS3Url(94-179)
🔇 Additional comments (8)
store_handler/eloq_data_store_service/rocksdb_config.cpp (2)
297-305: Good add:rocksdb_cloud_oss_urlflag + clear help text.
Looks consistent with the precedence model described in docs/other wiring.
520-526: Good: plumb OSS URL through config with cmdline-over-file precedence.
This matches how other flags are read in this file.core/src/log_init.cpp (1)
83-88: Flag addition looks fine; keep naming consistent with store-side (*_oss_url).
No concerns.store_handler/eloq_data_store_service/data_store_factory.h (1)
101-106: Virtual method is properly overridden in the relevant factory implementation.
RocksDBCloudDataStoreFactorycorrectly overridesIsOssUrlConfigured()and returns the config state. Other factories (RocksDBDataStoreFactory,EloqStoreDataStoreFactory) appropriately inherit the defaultfalsesince they don't support OSS URLs. The vtable change is a standard consequence of adding a virtual method and is safe for a monolithic internal codebase.store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h (1)
144-147: IsOssUrlConfigured() override looks good
Simple delegation keeps factory behavior consistent with config ownership.store_handler/eloq_data_store_service/rocksdb_cloud_data_store.cpp (2)
238-240: Good: don’t silently disable SSL verification
Keeping SSL verification enabled by default is the right security posture; if you need test-only behavior, gate it behind an explicit flag/env.
377-392: Transfer-manager toggle based on effective endpoint is correct and properly propagatedThe code logic is sound: when
effective_endpoint_urlis empty (S3 standard endpoint), AWS Transfer Manager is enabled for optimized multipart operations; when set (MinIO or custom endpoint), it is disabled due to known IOError issues. The endpoint configuration properly flows throughBuildS3ClientFactory→cfs_options_.s3_client_factory→ storage provider, and all cloud operations including manifest listing respect the configured endpoint. The behavior correctly handles boths3://URLs (empty endpoint) andhttp(s)://host/...URLs (custom endpoint with override).store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (1)
525-532: Good: bucket prefix is applied explicitly to src/dest buckets
SettingSetBucketNameandSetBucketPrefixseparately for both buckets improves clarity and matches the “prefix cleared on oss_url” behavior.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
core/src/log_init.cpp (1)
438-446: Don't log the raw OSS URL (can leak credentials for http(s) URLs).Line 440 logs
txlog_rocksdb_cloud_config.oss_url_, which may contain sensitive userinfo or query parameters. Since you already log the parsed, non-sensitive fields (bucket_name, object_path, endpoint_url), the raw URL should be removed from the log message.Apply this diff:
- LOG(INFO) << "Using OSS URL configuration for txlog (overrides legacy config if " - "present): " - << txlog_rocksdb_cloud_config.oss_url_ - << " (bucket: " << txlog_rocksdb_cloud_config.bucket_name_ + LOG(INFO) << "Using OSS URL configuration for txlog (overrides legacy config if " + "present)" + << " (bucket: " << txlog_rocksdb_cloud_config.bucket_name_ << ", object_path: " << txlog_rocksdb_cloud_config.object_path_ << ", endpoint: " << (txlog_rocksdb_cloud_config.endpoint_url_.empty() ? "default" : txlog_rocksdb_cloud_config.endpoint_url_) << ")";store_handler/eloq_data_store_service/rocksdb_cloud_data_store.cpp (1)
296-343: Effective OSS URL config is not used consistently in later operations.You compute
effective_bucket_name,effective_bucket_prefix,effective_object_path, andeffective_endpoint_url(lines 296-323) and apply them tocfs_options_(lines 335-342). However, later code still references the originalcloud_config_fields:
- Line 431-433:
FindMaxTermFromCloudManifestFilesusescloud_config_.bucket_prefix_,cloud_config_.bucket_name_,cloud_config_.object_path_- Lines 647-651:
PurgerEventListenerconstruction usescloud_config_.bucket_prefix_andcloud_config_.object_path_When OSS URL is configured, these operations will search/operate in the wrong location (using legacy config instead of effective values).
Persist the effective values to class members (e.g., add
effective_bucket_name_,effective_bucket_prefix_,effective_object_path_,effective_endpoint_url_to the class) when parsing OSS URL, then update all later code to use these members instead ofcloud_config_.*.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
core/src/log_init.cpp(3 hunks)store_handler/eloq_data_store_service/README.md(3 hunks)store_handler/eloq_data_store_service/RocksDB_Configuration_Flags.md(1 hunks)store_handler/eloq_data_store_service/rocksdb_cloud_data_store.cpp(3 hunks)store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp(6 hunks)store_handler/eloq_data_store_service/rocksdb_config.cpp(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- store_handler/eloq_data_store_service/RocksDB_Configuration_Flags.md
- store_handler/eloq_data_store_service/rocksdb_config.cpp
🧰 Additional context used
🧬 Code graph analysis (2)
store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (1)
store_handler/eloq_data_store_service/rocksdb_config.h (1)
ParseS3Url(94-179)
core/src/log_init.cpp (2)
store_handler/eloq_data_store_service/rocksdb_config.cpp (2)
CheckCommandLineFlagIsDefault(308-320)CheckCommandLineFlagIsDefault(308-308)core/include/data_substrate.h (1)
CheckCommandLineFlagIsDefault(311-322)
🔇 Additional comments (4)
store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (1)
243-275: OSS URL parsing and precedence logic is correct.The implementation correctly:
- Parses the OSS URL when provided
- Validates the URL format with descriptive error messages
- Overrides legacy configuration (bucket_name, bucket_prefix, object_path, s3_endpoint_url)
- Clears bucket_prefix as expected for URL-based config
- Logs the effective configuration
core/src/log_init.cpp (1)
83-88: LGTM!The new
txlog_rocksdb_cloud_object_store_service_urlflag is well-defined with clear documentation of supported formats and precedence semantics.store_handler/eloq_data_store_service/README.md (1)
134-309: Excellent documentation of the OSS URL migration.The migration guide is comprehensive and well-structured:
- Clear rationale for the new URL-based configuration
- Multiple practical examples covering AWS S3, MinIO, HTTPS endpoints, GCS, and txlog
- Important notes on precedence, bucket_prefix deprecation, backward compatibility, and protocol support
- Helpful validation guidance
This will significantly aid users in migrating to the new configuration format.
store_handler/eloq_data_store_service/rocksdb_cloud_data_store.cpp (1)
238-239: Good: SSL verification no longer unconditionally disabled.The comment documents that SSL verification can be disabled for test environments, but the code no longer does this by default. This is a security improvement over previous versions.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
store_handler/eloq_data_store_service/rocksdb_config.h (1)
137-152: Missing validation for empty endpoint host in http/https URLs.This issue was flagged in a previous review. URLs like
https:///bucket/pathwould pass validation but produce an empty host. Thefirst_slash == 0case should be rejected.core/src/log_init.cpp (1)
446-458: Avoid logging raw OSS URL - potential credential exposure.This was flagged in a previous review. The raw
oss_url_may contain userinfo or query params for http(s) URLs. Since you already log bucket, object_path, and endpoint separately, remove the raw URL from the log message.
🧹 Nitpick comments (1)
store_handler/eloq_data_store_service/rocksdb_config.h (1)
74-82: Code duplication withrocksdb_cloud_dump.cpp.This struct and the
ParseS3Urlfunction below are nearly identical to those instore_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp(lines 126-214). Consider extracting these into a shared header to avoid maintaining two copies.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
core/src/log_init.cpp(3 hunks)store_handler/eloq_data_store_service/data_store_factory.h(1 hunks)store_handler/eloq_data_store_service/data_store_service.cpp(1 hunks)store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h(2 hunks)store_handler/eloq_data_store_service/rocksdb_config.cpp(2 hunks)store_handler/eloq_data_store_service/rocksdb_config.h(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- store_handler/eloq_data_store_service/rocksdb_config.cpp
- store_handler/eloq_data_store_service/data_store_service.cpp
- store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h
🧰 Additional context used
🧬 Code graph analysis (3)
core/src/log_init.cpp (2)
core/include/data_substrate.h (1)
CheckCommandLineFlagIsDefault(311-322)store_handler/eloq_data_store_service/rocksdb_config.h (1)
ParseS3Url(94-182)
store_handler/eloq_data_store_service/rocksdb_config.h (1)
store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (2)
ParseS3Url(127-215)ParseS3Url(127-127)
store_handler/eloq_data_store_service/data_store_factory.h (1)
tx_service/include/store/data_store_handler.h (12)
bool(98-101)bool(103-106)bool(214-223)bool(225-229)bool(235-240)bool(366-371)bool(381-388)bool(390-394)bool(396-400)bool(410-413)bool(415-418)bool(435-438)
🔇 Additional comments (4)
store_handler/eloq_data_store_service/data_store_factory.h (1)
101-110: LGTM!The new virtual method follows the established pattern of the base class, providing a sensible default (
false) that derived factories can override when OSS URL configuration is in use.store_handler/eloq_data_store_service/rocksdb_config.h (1)
202-213: LGTM!The
oss_url_member andIsOssUrlConfigured()method are cleanly implemented, providing a clear way to check if OSS URL configuration is active.core/src/log_init.cpp (2)
83-89: LGTM!The flag definition is well-documented with clear format examples and precedence semantics.
302-309: LGTM!The configuration loading follows the established precedence pattern (command-line flag over config file).
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (3)
82-109: CLI help/usage still reads "legacy-first"; update to make object_store_service_url the primary path.The usage header (lines 86-87) prominently displays
--bucket_name=BUCKET --bucket_prefix=PREFIX, while--object_store_service_urlappears later in the options list. Since the OSS URL is now the preferred configuration method, consider restructuring the usage message to feature it prominently in the header and grouping legacy flags in a separate "Legacy Options" section.
342-407: Security: HTTPS connections will not verify SSL certificates.Line 390 unconditionally disables SSL verification (
config.verifySSL = false;) even when the endpoint uses HTTPS. This creates a man-in-the-middle vulnerability for HTTPS connections, which is especially problematic now that--object_store_service_urlexplicitly supportshttps://URLs.While the code correctly detects HTTPS endpoints (lines 368-370), it then disables verification regardless. If insecure mode is needed for local testing, gate it behind an explicit command-line flag (e.g.,
--insecure_skip_verify_ssl) rather than unconditionally disabling verification.Add a new flag and conditional logic:
+DEFINE_bool(insecure_skip_verify_ssl, false, "Skip SSL certificate verification for HTTPS (INSECURE - use only for testing)"); + // Define command line flags DEFINE_string(aws_access_key_id, "", "AWS access key ID");Then in the client factory:
{ config.scheme = Aws::Http::Scheme::HTTPS; + config.verifySSL = !FLAGS_insecure_skip_verify_ssl; } else { config.scheme = Aws::Http::Scheme::HTTP; + config.verifySSL = false; // No SSL for HTTP } - // Disable SSL verification for HTTPS - config.verifySSL = false;
111-230: Eliminate code duplication: reuse ParseS3Url from rocksdb_config.h.The
ParseS3Urlfunction andS3UrlComponentsstruct are duplicated verbatim fromstore_handler/eloq_data_store_service/rocksdb_config.h. This creates maintenance burden—bug fixes or enhancements must be applied in multiple places.Remove the duplicate definition and include
rocksdb_config.hinstead to referenceEloqDS::ParseS3Url.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp(6 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/tx_service PR: 288
File: store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h:54-76
Timestamp: 2025-12-16T09:47:03.594Z
Learning: In RocksDBCloudDataStoreFactory::CreateDataStore (store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h), when oss_url is configured, the factory always appends /ds_<shard_id> to the oss_url. Users should not include the shard path in the oss_url themselves.
📚 Learning: 2025-12-16T09:47:03.594Z
Learnt from: githubzilla
Repo: eloqdata/tx_service PR: 288
File: store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h:54-76
Timestamp: 2025-12-16T09:47:03.594Z
Learning: In RocksDBCloudDataStoreFactory::CreateDataStore (store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h), when oss_url is configured, the factory always appends /ds_<shard_id> to the oss_url. Users should not include the shard path in the oss_url themselves.
Applied to files:
store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp
🔇 Additional comments (4)
store_handler/eloq_data_store_service/rocksdb_cloud_dump.cpp (4)
47-67: Good flag structure with clear legacy markers.The flag definitions appropriately mark legacy options and clearly indicate that
object_store_service_urltakes precedence. This helps users migrate to the new configuration approach.
257-292: Well-implemented OSS URL configuration with proper precedence and security.The implementation correctly:
- Gives
object_store_service_urlprecedence over legacy flags- Validates the parsed URL before using it
- Sets
bucket_prefixto empty (as required when using OSS URL per learnings)- Logs only parsed components (lines 278-283) without exposing the raw URL, which addresses the previous security concern about credential leakage in http(s) URLs
529-534: Appropriate AWS SDK configuration for a utility tool.The AWS SDK initialization correctly enables the SigPipe handler (line 531) to prevent crashes from broken connections, and sets Info-level logging (lines 532-533) which is suitable for a diagnostic utility.
557-565: Symmetric bucket configuration for source and destination.The code correctly configures both source and destination buckets with the same parameters, which is appropriate for this dump utility. When
object_store_service_urlis used,bucket_prefixis empty (as set in line 274), aligning with the expected behavior per learnings.
Here are some reminders before you submit the pull request
fixes eloqdb/tx_service#issue_id./mtr --suite=mono_main,mono_multi,mono_basicSummary by CodeRabbit
New Features
Documentation
Bug Fixes / Reliability
✏️ Tip: You can customize this high-level summary in your review settings.