fix(spark): strip recognized typed options from Rust storage_options map#520
Open
LuciferYang wants to merge 1 commit into
Open
fix(spark): strip recognized typed options from Rust storage_options map#520LuciferYang wants to merge 1 commit into
LuciferYang wants to merge 1 commit into
Conversation
LanceSparkReadOptions.Builder.fromOptions saves the entire input map as storageOptions before parseTypedFlags promotes recognized keys to their dedicated builder fields, so typed connector-level knobs (path, pushDownFilters, block_size, version, index_cache_size, metadata_cache_size, batch_size, topN_push_down, nearest, executor_credential_refresh) were leaking into the Rust-side storage_options map, which is reserved for object-store credentials and endpoint config (aws_*, gcs_*, allow_http, ...). LanceSparkWriteOptions had the same pattern for write_mode, max_row_per_file, max_rows_per_group, max_bytes_per_file, file_format_version, use_queued_write_buffer, queue_depth, batch_size, enable_stable_row_ids, use_large_var_types, max_batch_bytes, and blob_pack_file_size_threshold. The Rust layer silently drops unknown keys, so no functional breakage — this is debug-hygiene only. Cleanup surfaced while investigating how typed read options flow through spark.sql.catalog.<name>.<key> catalog- level configuration introduced by lance-format#476: the catalog-level path works, but recognized typed keys also end up in the native storage_options map, adding noise to storage-layer logs and debug output. Introduce RECOGNIZED_TYPED_KEYS sets on both options classes and strip them in Builder.build(). Stripping in build() (not inside parseTypedFlags) preserves the fromOptions -> withCatalogDefaults merge semantics: the chain can re-parse typed keys from a still-populated storageOptions when per-read options merge over catalog defaults, and only the final post-merge state is cleaned.
bc2c3ee to
8c63dd5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
LanceSparkReadOptions.Builder.fromOptionssaves the entire input map asstorageOptionsbeforeparseTypedFlagspromotes recognized keys to their dedicated builder fields. Typed connector options (path,batch_size,block_size,version,executor_credential_refresh, ...) therefore appear in the map forwarded to the Rustobject_storelayer viaReadOptions.setStorageOptions. The native layer silently drops unknown keys, so there is no user-visible breakage — but the native log carries entries that do not belong there and confuse storage-layer troubleshooting.LanceSparkWriteOptionshas the same pattern forwrite_mode,max_row_per_file,batch_size,file_format_version, and nine others.Change
Introduce a
RECOGNIZED_TYPED_KEYSset on each options class and remove those keys fromstorageOptionsat the end ofBuilder.build(). Stripping inbuild()rather than insideparseTypedFlagspreserves thefromOptions → withCatalogDefaultsmerge semantics: catalog defaults merged after a per-read.option(...)still need to re-parse typed keys from a populated map, so only the final post-merge state is cleaned.Surfaced while tracing how
spark.sql.catalog.<name>.<key>configs flow throughwithCatalogDefaults(#476).Test plan
LanceSparkReadOptionsTypedKeyStrippingTestcovering each typed key, catalog defaults, per-read overrides, and passthrough for genuine storage keys.LanceSparkWriteOptionsTypedKeyStrippingTestmirroring read-side coverage.