Skip to content

bump Spark version "3.0.0" -> "3.2.0"#212

Open
satyakommula96 wants to merge 5 commits intodatabricks:masterfrom
satyakommula96:master
Open

bump Spark version "3.0.0" -> "3.2.0"#212
satyakommula96 wants to merge 5 commits intodatabricks:masterfrom
satyakommula96:master

Conversation

@satyakommula96
Copy link
Copy Markdown

No description provided.

satyakommula96 and others added 5 commits January 17, 2022 18:35
…port (#4)

* feat: migrate spark-sql-perf to Spark 3.x with Iceberg/Delta support

BREAKING CHANGES (Spark 2.x → 3.x):
- Replace all SQLContext APIs with SparkSession equivalents
- Replace sqlContext.implicits with spark.implicits across all modules
- Replace deprecated createExternalTable with spark.catalog.createTable
- Replace setConf/getAllConfs with spark.conf.set/spark.conf.getAll
- Replace createDataFrame/range/sparkContext calls with spark.* equivalents

Tables.scala (core changes):
- Rebuild createExternalTable() using explicit SQL DDL (CREATE EXTERNAL TABLE)
  to correctly handle partitioned and non-partitioned external tables
- Add isPartitioned flag (default: false) to createExternalTable/createExternalTables
  - isPartitioned=false: flat external table, reads all files (safe default)
  - isPartitioned=true: adds PARTITIONED BY + runs MSCK REPAIR TABLE
- Add Delta Lake and Iceberg to supported formats
- Skip MSCK REPAIR TABLE for delta/iceberg (they manage their own partitioning)
- Add try/catch around partition discovery for graceful degradation

Benchmark.scala / Benchmarkable.scala:
- Add val spark = sqlContext.sparkSession accessor throughout
- Replace sqlContext.read.json with spark.read.json

MLLib / MLBenchContext / dataGeneration:
- Add SparkSession accessor (val spark = sqlContext.sparkSession)
- Replace sql.sparkContext / sql.createDataFrame with spark.sparkSession.* calls
- Modernize constructor initialization

GenTPCDSData.scala:
- Update --format help text to include Delta and Iceberg as valid options

Tooling:
- Add sbt-scalafmt 2.5.2 plugin to project/plugins.sbt
- Add .scalafmt.conf (Scala 2.12, maxColumn=100, import sorting)
- Apply scalafmt formatting to all 63 Scala sources

Build: verified sbt compile + sbt assembly succeed on Spark 3.5.1

* ci: add scalafmtCheck job to GitHub Actions workflow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant