Skip to content

fix(cdc_snapshot): add Spark binaryFile fallback for dbutils.fs.ls() and fix parquet directory enumeration in historical CDC snapshot#83

Merged
haillew merged 6 commits into
mainfrom
fix/cdc-snapshot-dbutils-fallback-and-parquet-dir-enumeration
May 28, 2026
Merged

fix(cdc_snapshot): add Spark binaryFile fallback for dbutils.fs.ls() and fix parquet directory enumeration in historical CDC snapshot#83
haillew merged 6 commits into
mainfrom
fix/cdc-snapshot-dbutils-fallback-and-parquet-dir-enumeration

Conversation

@rederik76
Copy link
Copy Markdown
Collaborator

  • _list_files now tries dbutils.fs.ls() first; falls back to Spark
    binaryFile on any exception (e.g. Py4JSecurityException in Serverless
    with Restricted Access / SEG)
  • Fix bug where dbutils.fs().ls() was called with parentheses on fs
  • binaryFile fallback stops at .parquet directories and deduplicates
    part files so each snapshot version is counted once
  • dbutils path also guards against recursing into .parquet directories
    (trailing "/" stripped before the endswith check)

rederik76 added 2 commits May 20, 2026 14:22
…t directory enumeration in historical CDC snapshot

- _list_files now tries dbutils.fs.ls() first; falls back to Spark
  binaryFile on any exception (e.g. Py4JSecurityException in Serverless
  with Restricted Access / SEG)
- Fix bug where dbutils.fs().ls() was called with parentheses on fs
- binaryFile fallback stops at .parquet directories and deduplicates
  part files so each snapshot version is counted once
- dbutils path also guards against recursing into .parquet directories
  (trailing "/" stripped before the endswith check)
@rederik76 rederik76 requested a review from haillew May 26, 2026 07:05
@haillew haillew merged commit bb91ba6 into main May 28, 2026
@haillew haillew deleted the fix/cdc-snapshot-dbutils-fallback-and-parquet-dir-enumeration branch May 28, 2026 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants