Skip to content

fix(spark): gss initiate failed on hms executors; spark.sql.catalog read options not applied#476

Merged
fangbo merged 1 commit into
lance-format:mainfrom
xiaguanglei:fix/executor-credential-refresh
May 7, 2026
Merged

fix(spark): gss initiate failed on hms executors; spark.sql.catalog read options not applied#476
fangbo merged 1 commit into
lance-format:mainfrom
xiaguanglei:fix/executor-credential-refresh

Conversation

@xiaguanglei
Copy link
Copy Markdown
Contributor

@xiaguanglei xiaguanglei commented Apr 23, 2026

Summary

Fixes org.apache.thrift.transport.TTransportException: GSS initiate failed thrown on Spark executors when reading Lance tables registered in a Kerberized Hive Metastore (both plain SELECT and SQL DML).

This PR does two things:

  1. Stops executors from rebuilding the namespace client unconditionally. Adds a new read option executor_credential_refresh (default true, preserving current behavior). When set to false, executors skip the eager namespace.describeTable() RPC and open the dataset directly by URI using the storage options the driver already obtained.
  2. Makes catalog-level read options actually reach the typed fields. Catalog-level conf (--conf spark.sql.catalog.<name>.executor_credential_refresh=false) is now parsed in withCatalogDefaults(), so spark.sql(...) queries (including SELECT and SQL DML) — which have no spark.read.option(...) attach point — pick up the flag the same way as DataFrameReader-based reads.

Root Cause

Since #353 removed LanceDatasetCache, LanceFragmentScanner.create() unconditionally rebuilds the LanceNamespace client on each Spark executor and binds it back onto LanceSparkReadOptions. This forces the dataset open through Utils.OpenDatasetBuilder's namespaceClient branch, which in turn calls OpenDatasetBuilder.buildFromNamespaceClient() in the Lance Java SDK — and that path issues an eager namespace.describeTable() RPC before handing off to Rust.

For catalogs where the backing service authenticates per-call (HMS over Kerberos, some REST catalogs), Spark executors typically do not have a Kerberos TGT — the --keytab / --principal credentials only reach the driver / ApplicationMaster, while executors run with Hadoop delegation tokens that cannot be used for HMS Thrift SASL. The describeTable RPC therefore fails with:

org.apache.thrift.transport.TTransportException: GSS initiate failed
  at org.lance.namespace.hive2.Hive2ClientPool.newClient(Hive2ClientPool.java:42)
  at org.lance.namespace.hive2.Hive2Namespace.describeTable(Hive2Namespace.java:285)
  at org.lance.OpenDatasetBuilder.buildFromNamespaceClient(OpenDatasetBuilder.java:205)
  at org.lance.OpenDatasetBuilder.build(OpenDatasetBuilder.java:191)
  at org.lance.spark.utils.Utils$OpenDatasetBuilder.build(Utils.java:140)
  at org.lance.spark.internal.LanceFragmentScanner.create(LanceFragmentScanner.java:67)

Driver-side operations (metadata-only queries, count-via-manifest) succeed because the driver has the TGT. The failure only manifests during fragment scans.

Why the Existing Behavior Exists

Rebuilding the namespace client on the executor is not dead code — it keeps the Rust LanceNamespaceStorageOptionsProvider attached so that short-lived vended credentials (STS tokens for S3 / GCS / Azure) returned by describeTable() can be refreshed when they expire mid-scan. Simply removing the rebuild would break long-running scans against object stores that use credential vending.

Fix

1. Gate the executor-side rebuild behind a new option

Add a boolean read option executor_credential_refresh, defaulting to true:

  • true (default): unchanged — executor rebuilds the namespace client and routes through the namespaceClient branch, preserving credential refresh. Safe for all existing users.
  • false: executor skips the rebuild, reads remain open via URI using the initialStorageOptions the driver already obtained from describeTable() at scan-plan time.

2. Make catalog-level conf actually reach the typed field

Before this PR, Builder.withCatalogDefaults(catalogConfig) only merged the storage-options map and never parsed typed flags. As a result, the catalog-conf syntax looked like it should work but silently ignored the flag. This PR extracts a parseTypedFlags(Map<String, String>) helper and calls it from both fromOptions() and withCatalogDefaults(), so every recognized read option (not just executor_credential_refresh) now flows from catalog conf into the typed field.

This is what makes the fix usable from SQL DML. Without the withCatalogDefaults parse, a user running DELETE FROM kerberized_hms_lance_table WHERE id = 1 has no way to disable the rebuild — SQL DML has no per-statement .option(...) attach point.

Configuration surfaces after this PR

Surface Example Works?
Per-read option spark.read.option("executor_credential_refresh", "false").table(...) (DataFrameReader) Yes (already worked before this PR; not available for spark.sql("SELECT ..."))
Catalog conf + plain SELECT --conf spark.sql.catalog.lance.executor_credential_refresh=false + spark.sql("SELECT * FROM lance.db.t") Yes (fixed by this PR)
Catalog conf + SQL DML --conf spark.sql.catalog.lance.executor_credential_refresh=false + spark.sql("DELETE FROM lance.db.t WHERE id=1") Yes (fixed by this PR)

Intended usage for HMS + Kerberos deployments:

spark-submit ... \
  --keytab /etc/keytabs/my.keytab \
  --principal my/principal@REALM \
  --conf spark.sql.catalog.lance.executor_credential_refresh=false

Per-Namespace Trade-off Analysis

The refresh callback is meaningful only for namespaces that actually return storage_options from describeTable(). Survey of the impls in lance-namespace-impls:

Namespace describeTable() populates storage_options? Cost of executor_credential_refresh=false
Hive2Namespace No — setLocation only None. The refresh callback is a no-op for HMS regardless of underlying storage.
Hive3Namespace No — setLocation only None. Same as Hive2.
GlueNamespace Static config.getStorageOptions() Effectively none for plain Glue. Use the default if you rely on LakeFormation vended creds.
IcebergNamespace (REST) Yes — vended creds typical Long scans against vended creds will fail when the credential expires.
PolarisNamespace Yes — vended creds typical Same as Iceberg REST.
UnityNamespace Yes — Databricks-vended temp creds Same as Iceberg REST.

Concretely, for the HMS + S3 case: HMS does not vend S3 credentials (describeTable() only sets location), so the executor's S3 access is governed entirely by the AWS SDK credential chain (instance profile / hive-site.xml / env vars / ~/.aws/credentials) and the AWS SDK handles all STS rotation independently. The Lance refresh callback would have nothing to refresh, so disabling it costs nothing in practice.

Scope of Change

  • LanceSparkReadOptions.java:
    • New constant CONFIG_EXECUTOR_CREDENTIAL_REFRESH, new field executorCredentialRefresh (default true), builder / getter / withVersion propagation / equals / hashCode, Javadoc covering per-namespace trade-off.
    • Extracted Builder.parseTypedFlags(Map<String, String>) helper from the previously duplicated fromOptions body, now called from both fromOptions() and withCatalogDefaults(). This incidentally also fixes silent ignores of push_down_filters, batch_size, topN_push_down, etc. when set at the catalog level — a pre-existing latent issue uncovered while fixing the primary bug.
  • LanceFragmentScanner.java: add && readOptions.isExecutorCredentialRefresh() to the existing rebuild if, inline comment explaining the trade-off.
  • LanceSparkReadOptionsSerializationTest.java: six new tests covering default value, map parsing, serialization round-trip, withVersion propagation, catalog-defaults path, and per-read override precedence.

No public API signature is changed; no existing behavior is altered for users who do not set the new option.

Test Plan

New unit tests (all 305 tests in lance-spark-base_2.12 pass locally):

  • testExecutorCredentialRefreshDefaultsToTrue — default value preserved.
  • testExecutorCredentialRefreshParsedFromOptions — flag honored from both "true" and "false" map entries.
  • testExecutorCredentialRefreshSurvivesSerialization — flag survives Java serialization (critical: it must reach the executor).
  • testExecutorCredentialRefreshPreservedByWithVersion — flag propagated by withVersion() used during scan-plan version pinning.
  • testExecutorCredentialRefreshFromCatalogDefaults — new; guards the catalog-conf path used by SQL DML.
  • testPerReadOptionOverridesCatalogDefaults — new; pins the precedence rule "per-read .option(...) wins over catalog default".
  • LanceFragmentScannerTest.java: one new end-to-end test that locks in the executor-branch contract through LanceFragmentScanner.create (per reviewer suggestion).

Integration test (out-of-band, on internal YARN + HMS + Kerberos + HDFS cluster):

  1. spark-submit with --keytab / --principal, Kerberized HMS, SELECT * FROM lance_hms_table via lance-namespace-hive2.
  2. Before fix: executor task fails with GSS initiate failed on describeTable. Reproducible across multiple partitions / runs.
  3. After fix with --conf spark.sql.catalog.lance.executor_credential_refresh=false: scan completes, returns expected row count and sample rows.
  4. Default path (true) with the same fix jar: behavior unchanged.

Backward Compatibility

Default is true, so every existing job behaves identically without touching configs. Only users who explicitly set the new option to false opt into the new path.

@github-actions github-actions Bot added the bug Something isn't working label Apr 23, 2026
@xiaguanglei
Copy link
Copy Markdown
Contributor Author

Related #353 cc @LuciferYang @hamersaw ,Could you please take a look,Thank you.

@LuciferYang
Copy link
Copy Markdown
Contributor

LuciferYang commented Apr 24, 2026

this pr documents three ways to set executor_credential_refresh:

Route Status
--conf spark.sql.catalog.<name>.executor_credential_refresh=false Broken — catalog conf never reaches the typed field
spark.read.option("executor_credential_refresh", "false").table(...) Works for plain SELECT
SQL DML (DELETE / UPDATE / MERGE INTO) No escape exists — SQL DML has no .option(...) attach point, and the catalog conf is broken

Why the catalog conf is broken.
In lance-spark-base_2.12/src/main/java/org/lance/spark/LanceSparkReadOptions.java, Builder.withCatalogDefaults() (lines 489-495) only merges storageOptions — it never parses typed flags. So executor_credential_refresh set at the catalog level lands in the raw-options map but never makes it into the typed executorCredentialRefresh field.

Net effect. A user running DELETE FROM kerberized_hms_lance_table WHERE id = 1 cannot disable the rebuild on any Spark version, and GSS initiate failed still fires. The PR fixes the GSS bug for SELECT but not for DML.

Required changes (one file: LanceSparkReadOptions.java)

  1. Extract a private helper in Builder:

    private void parseTypedFlags(Map<String, String> opts) {
        if (opts.containsKey(CONFIG_PUSH_DOWN_FILTERS)) {
            this.pushDownFilters = Boolean.parseBoolean(opts.get(CONFIG_PUSH_DOWN_FILTERS));
        }
        if (opts.containsKey(CONFIG_BLOCK_SIZE)) {
            this.blockSize = Integer.parseInt(opts.get(CONFIG_BLOCK_SIZE));
        }
        if (opts.containsKey(CONFIG_VERSION)) {
            this.version = Integer.parseInt(opts.get(CONFIG_VERSION));
        }
        if (opts.containsKey(CONFIG_INDEX_CACHE_SIZE)) {
            this.indexCacheSize = Integer.parseInt(opts.get(CONFIG_INDEX_CACHE_SIZE));
        }
        if (opts.containsKey(CONFIG_METADATA_CACHE_SIZE)) {
            this.metadataCacheSize = Integer.parseInt(opts.get(CONFIG_METADATA_CACHE_SIZE));
        }
        if (opts.containsKey(CONFIG_BATCH_SIZE)) {
            int parsedBatchSize = Integer.parseInt(opts.get(CONFIG_BATCH_SIZE));
            Preconditions.checkArgument(parsedBatchSize > 0, "batch_size must be positive");
            this.batchSize = parsedBatchSize;
        }
        if (opts.containsKey(CONFIG_TOP_N_PUSH_DOWN)) {
            this.topNPushDown = Boolean.parseBoolean(opts.get(CONFIG_TOP_N_PUSH_DOWN));
        }
        if (opts.containsKey(CONFIG_NEAREST)) {
            nearest(opts.get(CONFIG_NEAREST));
        }
        if (opts.containsKey(CONFIG_EXECUTOR_CREDENTIAL_REFRESH)) {
            this.executorCredentialRefresh =
                Boolean.parseBoolean(opts.get(CONFIG_EXECUTOR_CREDENTIAL_REFRESH));
        }
    }
  2. Replace the inline typed-flag parses in fromOptions() (lines 453-479) with a single call:

    public Builder fromOptions(Map<String, String> options) {
        this.storageOptions = new HashMap<>(options);
        parseTypedFlags(options);
        return this;
    }
  3. Add the same call to withCatalogDefaults() after the merge:

    public Builder withCatalogDefaults(LanceSparkCatalogConfig catalogConfig) {
        Map<String, String> merged = new HashMap<>(catalogConfig.getStorageOptions());
        merged.putAll(this.storageOptions);
        this.storageOptions = merged;
        parseTypedFlags(catalogConfig.getStorageOptions());  // NEW
        return this;
    }

After these edits, --conf spark.sql.catalog.<name>.executor_credential_refresh=false works uniformly for plain SELECT, per-read .option(...), and SQL DML. No changes to LancePositionDeltaOperation.java are required.

Please also update the PR description's "Intended usage" example — the catalog-conf syntax will finally work after this edit, so the example is correct once the edit lands.

@xiaguanglei xiaguanglei force-pushed the fix/executor-credential-refresh branch 2 times, most recently from b832b6f to fb46a06 Compare April 24, 2026 13:36
@xiaguanglei
Copy link
Copy Markdown
Contributor Author

this pr documents three ways to set executor_credential_refresh:

Route Status
--conf spark.sql.catalog.<name>.executor_credential_refresh=false Broken — catalog conf never reaches the typed field
spark.read.option("executor_credential_refresh", "false").table(...) Works for plain SELECT
SQL DML (DELETE / UPDATE / MERGE INTO) No escape exists — SQL DML has no .option(...) attach point, and the catalog conf is broken
Why the catalog conf is broken. In lance-spark-base_2.12/src/main/java/org/lance/spark/LanceSparkReadOptions.java, Builder.withCatalogDefaults() (lines 489-495) only merges storageOptions — it never parses typed flags. So executor_credential_refresh set at the catalog level lands in the raw-options map but never makes it into the typed executorCredentialRefresh field.

Net effect. A user running DELETE FROM kerberized_hms_lance_table WHERE id = 1 cannot disable the rebuild on any Spark version, and GSS initiate failed still fires. The PR fixes the GSS bug for SELECT but not for DML.

Required changes (one file: LanceSparkReadOptions.java)

  1. Extract a private helper in Builder:
    private void parseTypedFlags(Map<String, String> opts) {
        if (opts.containsKey(CONFIG_PUSH_DOWN_FILTERS)) {
            this.pushDownFilters = Boolean.parseBoolean(opts.get(CONFIG_PUSH_DOWN_FILTERS));
        }
        if (opts.containsKey(CONFIG_BLOCK_SIZE)) {
            this.blockSize = Integer.parseInt(opts.get(CONFIG_BLOCK_SIZE));
        }
        if (opts.containsKey(CONFIG_VERSION)) {
            this.version = Integer.parseInt(opts.get(CONFIG_VERSION));
        }
        if (opts.containsKey(CONFIG_INDEX_CACHE_SIZE)) {
            this.indexCacheSize = Integer.parseInt(opts.get(CONFIG_INDEX_CACHE_SIZE));
        }
        if (opts.containsKey(CONFIG_METADATA_CACHE_SIZE)) {
            this.metadataCacheSize = Integer.parseInt(opts.get(CONFIG_METADATA_CACHE_SIZE));
        }
        if (opts.containsKey(CONFIG_BATCH_SIZE)) {
            int parsedBatchSize = Integer.parseInt(opts.get(CONFIG_BATCH_SIZE));
            Preconditions.checkArgument(parsedBatchSize > 0, "batch_size must be positive");
            this.batchSize = parsedBatchSize;
        }
        if (opts.containsKey(CONFIG_TOP_N_PUSH_DOWN)) {
            this.topNPushDown = Boolean.parseBoolean(opts.get(CONFIG_TOP_N_PUSH_DOWN));
        }
        if (opts.containsKey(CONFIG_NEAREST)) {
            nearest(opts.get(CONFIG_NEAREST));
        }
        if (opts.containsKey(CONFIG_EXECUTOR_CREDENTIAL_REFRESH)) {
            this.executorCredentialRefresh =
                Boolean.parseBoolean(opts.get(CONFIG_EXECUTOR_CREDENTIAL_REFRESH));
        }
    }
  2. Replace the inline typed-flag parses in fromOptions() (lines 453-479) with a single call:
    public Builder fromOptions(Map<String, String> options) {
        this.storageOptions = new HashMap<>(options);
        parseTypedFlags(options);
        return this;
    }
  3. Add the same call to withCatalogDefaults() after the merge:
    public Builder withCatalogDefaults(LanceSparkCatalogConfig catalogConfig) {
        Map<String, String> merged = new HashMap<>(catalogConfig.getStorageOptions());
        merged.putAll(this.storageOptions);
        this.storageOptions = merged;
        parseTypedFlags(catalogConfig.getStorageOptions());  // NEW
        return this;
    }

After these edits, --conf spark.sql.catalog.<name>.executor_credential_refresh=false works uniformly for plain SELECT, per-read .option(...), and SQL DML. No changes to LancePositionDeltaOperation.java are required.

Please also update the PR description's "Intended usage" example — the catalog-conf syntax will finally work after this edit, so the example is correct once the edit lands.

@LuciferYang Thanks for the review and for catching the withCatalogDefaults issue — that was a great find, and it really helped tighten the fix.
I’ve now verified this on our YARN cluster: with --conf spark.sql.catalog..executor_credential_refresh=false, the option is picked up correctly and fragment scans complete successfully. So the flag works as intended and disables the executor-side namespace rebuild (the path that was hitting HMS and failing with GSS initiate failed on executors).
Could you please take another look at the updated PR when you have a moment?

@xiaguanglei xiaguanglei changed the title fix(spark): fix GSS initiate failed on executors when reading Lance tables via Hive Metastore fix(spark): fix GSS initiate failed on executors when reading Lance tables via Hive Metastore and catalog read options not taking effect Apr 24, 2026
@github-actions
Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@xiaguanglei xiaguanglei changed the title fix(spark): fix GSS initiate failed on executors when reading Lance tables via Hive Metastore and catalog read options not taking effect fix(spark): GSS initiate failed on HMS executors; catalog read config not applied Apr 24, 2026
@xiaguanglei xiaguanglei changed the title fix(spark): GSS initiate failed on HMS executors; catalog read config not applied fix(spark): gss initiate failed on hms executors; spark.sql.catalog read options not applied Apr 24, 2026
Copy link
Copy Markdown
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one suggestion regarding testing.

cc @hamersaw @fangbo for further review

@xiaguanglei xiaguanglei force-pushed the fix/executor-credential-refresh branch from fb46a06 to d5a3742 Compare April 27, 2026 09:10
Map<String, String> merged = new HashMap<>(catalogConfig.getStorageOptions());
merged.putAll(this.storageOptions);
this.storageOptions = merged;
parseTypedFlags(catalogConfig.getStorageOptions());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L533-L534 is duplicated with fromOptions, can we call fromOptions(merged) ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adopted: withCatalogDefaults now delegates to fromOptions(merged) — single source of truth, identical behavior on every existing call site.
After rebasing onto main, I noticed upstream(#487) landed LanceRuntime.useNamespaceOnWorkers(...), a static per-impl allowlist that currently only marks "glue" as false. For hive2/hive3 it still defaults to rebuilding the namespace on workers, so on Kerberized HMS the executor would still re-issue describeTable() and hit GSS initiate failed. executor_credential_refresh=false remains the per-job/per-catalog opt-out for those cases — complementary to the upstream allowlist, not redundant. The original GSS scenario this PR targets is unaffected by the rebase, and all existing tests still pass.

…ptions not taking effect

Add executor_credential_refresh (default true). When false, skip executor-side namespace rebuild to avoid describeTable() on Kerberized HMS without a TGT (GSS initiate failed on fragment scans).

Parse typed read flags in Builder.withCatalogDefaults() so keys under spark.sql.catalog.<name> (e.g. executor_credential_refresh, batch_size, push_down_filters) apply to plain SQL and DML, not only DataFrameReader.option paths.
@xiaguanglei xiaguanglei force-pushed the fix/executor-credential-refresh branch from d5a3742 to 0263227 Compare April 29, 2026 08:05
@xiaguanglei
Copy link
Copy Markdown
Contributor Author

@fangbo @hamersaw Hello, do you have any further opinions about this PR

@fangbo
Copy link
Copy Markdown
Collaborator

fangbo commented May 7, 2026

+1 LGTM

@fangbo fangbo merged commit b00c69a into lance-format:main May 7, 2026
17 checks passed
LuciferYang added a commit to LuciferYang/lance-spark that referenced this pull request May 12, 2026
LanceSparkReadOptions.Builder.fromOptions saves the entire input map as
storageOptions before parseTypedFlags promotes recognized keys to their
dedicated builder fields, so typed connector-level knobs (path,
pushDownFilters, block_size, version, index_cache_size,
metadata_cache_size, batch_size, topN_push_down, nearest,
executor_credential_refresh) were leaking into the Rust-side
storage_options map, which is reserved for object-store credentials and
endpoint config (aws_*, gcs_*, allow_http, ...). LanceSparkWriteOptions
had the same pattern for write_mode, max_row_per_file,
max_rows_per_group, max_bytes_per_file, file_format_version,
use_queued_write_buffer, queue_depth, batch_size, enable_stable_row_ids,
use_large_var_types, max_batch_bytes, and blob_pack_file_size_threshold.

The Rust layer silently drops unknown keys, so no functional breakage —
this is debug-hygiene only. Cleanup surfaced while investigating how
typed read options flow through spark.sql.catalog.<name>.<key> catalog-
level configuration introduced by lance-format#476: the catalog-level path works,
but recognized typed keys also end up in the native storage_options
map, adding noise to storage-layer logs and debug output.

Introduce RECOGNIZED_TYPED_KEYS sets on both options classes and strip
them in Builder.build(). Stripping in build() (not inside
parseTypedFlags) preserves the fromOptions -> withCatalogDefaults merge
semantics: the chain can re-parse typed keys from a still-populated
storageOptions when per-read options merge over catalog defaults, and
only the final post-merge state is cleaned.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants