[SPARK-53840][SQL] Add AS JSON output support for SHOW TABLES and SHOW TABLE EXTENDED#54824
[SPARK-53840][SQL] Add AS JSON output support for SHOW TABLES and SHOW TABLE EXTENDED#54824ayushbilala wants to merge 4 commits into
Conversation
91aeb50 to
625fbe2
Compare
|
@asl3 Have you had the opportunity to review this pull request yet? |
asl3
left a comment
There was a problem hiding this comment.
Thank you! Overall looks good to me.
Can you also add test coverage for temp views (uses a separate listTempViews call)?
|
@asl3 I have added tests for temp views. Can you please check? |
cloud-fan
left a comment
There was a problem hiding this comment.
Summary
This PR adds AS JSON output mode to SHOW TABLES and SHOW TABLE EXTENDED, threading an asJson: Boolean flag through the parser, logical plan nodes (ShowTables, ShowTablesExtended), resolution (ResolveSessionCatalog, DataSourceV2Strategy), and execution (ShowTablesCommand for V1, ShowTablesExec/ShowTablesExtendedExec for V2). When asJson=true, a single-row result with a json_metadata column is returned. PARTITION + AS JSON is rejected at parse time.
General comments
-
The JSON generation logic is duplicated across three execution classes (
ShowTablesCommand,ShowTablesExec,ShowTablesExtendedExec), and findings #1 and #2 below are direct consequences of that duplication — different listing strategies for temp views in V2, different ways to obtain the catalog name in V1. The existingDESCRIBE TABLE ... AS JSONavoids this entirely by using a separate command class (DescribeRelationJsonCommand) — a singleUnaryRunnableCommandthat pattern-matches on the already-resolved child and handles all catalog types in one place, with no need for separate V1 and V2 implementations. Consider following the same pattern: a singleShowTablesJsonCommandthat receives the resolved namespace, gets the catalog from it, and centralizes JSON generation. This eliminates the V1/V2 divergence that caused both bugs. -
No test checks for the absence of duplicate entries in JSON output — all tests use
.find()which returns the first match. Adding an assertion ontables.length(expected count) would catch duplicates.
| }.toList | ||
|
|
||
| val jsonTempViews = if (CatalogV2Util.isSessionCatalog(catalog)) { | ||
| val sessionCatalog = session.sessionState.catalog |
There was a problem hiding this comment.
catalog.listTables() via V2SessionCatalog calls SessionCatalog.listTables(db, "*", includeLocalTempViews=true) which already includes local temp views. This separate listTempViews() call adds them a second time, causing duplicate entries in the JSON output when the V2 session catalog is used (when useV1Command=false).
The non-JSON path at line 78 does not list temp views separately — it correctly relies on what catalog.listTables() returns.
There was a problem hiding this comment.
Good catch! I will remove the redundant sessionCatalog.listTempViews(...) block entirely.
There was a problem hiding this comment.
Fixed. JSON path has been moved to the new ShowTablesJsonExec which guards the separate listTempViews() call with !CatalogV2Util.isSessionCatalog(catalog), matching the logic of the non-JSON path.
When catalog is V2SessionCatalog, temp views are already included in listTables() results and the separate call is skipped.
| "catalog" -> JString( | ||
| sparkSession.sessionState.catalogManager.currentCatalog.name()), |
There was a problem hiding this comment.
currentCatalog returns the user's currently active catalog (set via USE), but ShowTablesCommand always operates on the session catalog (matched via ResolvedV1Database). If the user runs e.g. USE my_v2_catalog; SHOW TABLE EXTENDED IN spark_catalog.mydb LIKE '*' AS JSON, this reports "catalog": "my_v2_catalog" instead of "spark_catalog". The V2 exec correctly uses catalog.name() instead.
| "catalog" -> JString( | |
| sparkSession.sessionState.catalogManager.currentCatalog.name()), | |
| "catalog" -> JString( | |
| sparkSession.sessionState.catalogManager.v2SessionCatalog.name()), |
There was a problem hiding this comment.
Thanks for pointing out the scoping distinction with USE. I'll update the implementation.
There was a problem hiding this comment.
Fixed - now uses sparkSession.sessionState.catalogManager.v2SessionCatalog.name() as suggested. This correctly returns spark_catalog regardless of which catalog the user has set active via USE.
| "DESC TABLE COLUMN for a specific partition." | ||
| ] | ||
| }, | ||
| "SHOW_TABLE_EXTENDED_JSON_WITH_PARTITION" : { |
There was a problem hiding this comment.
The UNSUPPORTED_FEATURE sub-classes are alphabetically ordered. SHOW_TABLE_EXTENDED_JSON_WITH_PARTITION (S-H) is placed between DESC_TABLE_COLUMN_PARTITION (D-E) and DROP_DATABASE (D-R), but it should be placed between SET_VARIABLE_USING_SET and SQL_CURSOR to maintain alphabetical order.
There was a problem hiding this comment.
Fixed - SHOW_TABLE_EXTENDED_JSON_WITH_PARTITION is now placed between SET_VARIABLE_USING_SET and SQL_CURSOR to maintain alphabetical order.
|
Addressed all review comments. Please take another pass when you get a chance. Thanks! |
| case Seq(db) => Some(db) | ||
| case _ => None | ||
| } | ||
| val tempViews = sessionCatalog.listTempViews(db.getOrElse(""), pattern) |
There was a problem hiding this comment.
For V2SessionCatalog, catalog.listTables(namespace.toArray) already includes local temp views (and globals when ns is global_temp) via SessionCatalog.listTables(db, "*", includeLocalTempViews=true).
This block lists them again, so each temp view is emitted twice: once with type=TABLE, isTemporary=false, once with type=VIEW, isTemporary=true. The same duplicate path was removed from ShowTablesExec.
Should either skip this listTempViews block (matching ShowTablesExec) and use an isTempView-style helper above, or filter temp views out of tables before emission.
There was a problem hiding this comment.
Fixed. JSON path has been extracted to ShowTablesJsonExec. There is a guard !CatalogV2Util.isSessionCatalog(catalog) to prevent the duplicate listTempViews() fetch when catalog is V2SessionCatalog.
Remaining code in ShowTablesExtendedExec is the text-only path, which is unchanged.
| val tables = catalog.listTables(namespace.toArray) | ||
| tables.foreach { tableIdent => | ||
| if (StringUtils.filterPattern(Seq(tableIdent.name()), pattern).nonEmpty) { | ||
| // V2 persistent tables are always TABLE |
There was a problem hiding this comment.
The comment "V2 persistent tables are always TABLE" and unconditional type=TABLE, isTemporary=false is wrong when catalog is V2SessionCatalog. Can use a per-row isTempView check (example: ShowTablesExec.isTempView)
There was a problem hiding this comment.
Fixed. In ShowTablesJsonExec, each identifier from listTables() / listTableAndViewSummaries() is now passed through isTempView(ident), which checks SessionCatalog.isTempView when catalog is V2SessionCatalog.
Temp views returned by listTables() are correctly identified and emitted with type=VIEW, isTemporary=true.
| Try(catalog.getTempViewOrPermanentTableMetadata(tableIdent)) match { | ||
| case Success(meta) if meta.tableType == CatalogTableType.VIEW => "VIEW" | ||
| case Success(_) => "TABLE" | ||
| case Failure(_) => "UNKNOWN" |
There was a problem hiding this comment.
Failure(_) => "UNKNOWN" swallows any exception from getTempViewOrPermanentTableMetadata. The text path lets exceptions propagate - consider following the same here
There was a problem hiding this comment.
Fixed - Try/catch block that returned "UNKNOWN" on failure has been removed. getTempViewOrPermanentTableMetadata now propagates exceptions directly, consistent with the text path.
| val json = parse(jsonStr) | ||
| val tables = (json \ "tables").asInstanceOf[JArray].arr | ||
|
|
||
| assert(tables.length == 2) |
There was a problem hiding this comment.
this test seems failing in CI?
| } | ||
| } | ||
|
|
||
| test("SHOW TABLE EXTENDED AS JSON with both local and global temp views") { |
There was a problem hiding this comment.
seems @cloud-fan 's comment about .find() masking duplicates still needs to be addressed here
There was a problem hiding this comment.
Fixed - added assert(tables.length == N) to the tests that used .find() without a count assertion. This will catch silent duplicates.
| case class ShowTables( | ||
| namespace: LogicalPlan, | ||
| pattern: Option[String], | ||
| asJson: Boolean = false, |
There was a problem hiding this comment.
Currently have duplicated JSON emission code across Exec classes. Consider a ShowTablesJsonCommand which takes the resolved namespace/catalog and handles all variants centrally, similar to DescribeRelationJsonCommand
There was a problem hiding this comment.
Addressed. We extracted a single ShowTablesJsonExec physical plan node covering both SHOW TABLES AS JSON and SHOW TABLE EXTENDED AS JSON, controlled by an isExtended: Boolean flag.
| if (asJson) { | ||
| val jsonTables = filteredTables.map { table => | ||
| JObject( | ||
| "name" -> JString(table.name()), |
There was a problem hiding this comment.
field order should be consistent between non-extended {name, namespace, isTemporary} and extended {catalog, namespace, name, type, isTemporary}
There was a problem hiding this comment.
Fixed. JSON code is consolidated in ShowTablesJsonExec.toJsonEntry(). Both schemas now have name as the first field: non-extended is {name, namespace, isTemporary} and extended is {name, catalog, namespace, type, isTemporary}.
V1 path in ShowTablesCommand follows the same ordering.
e6c7b33 to
23c5cbc
Compare
|
@cloud-fan Both addressed. For (a): we extracted a single ShowTablesJsonExec physical plan node that handles both SHOW TABLES AS JSON (isExtended=false) and SHOW TABLE EXTENDED AS JSON (isExtended=true). DataSourceV2Strategy now routes asJson=true to it for both variants, eliminating the duplicated JSON emission from ShowTablesExec and ShowTablesExtendedExec. V1 path in ShowTablesCommand remains separate since it operates on a different plan type (ResolvedV1Database). For (b): added assert(tables.length == N) to the tests that used .find() without a count check. |
| override def child: LogicalPlan = namespace | ||
| override protected def withNewChildInternal(newChild: LogicalPlan): ShowTables = | ||
| copy(namespace = newChild) | ||
| override protected def stringArgs: Iterator[Any] = Iterator(pattern, output) |
There was a problem hiding this comment.
The asJson: Boolean = false field added to ShowTables (and
the parallel field on ShowTablesExtended and ShowTablesCommand) is then
immediately hidden from treeString by an override protected def stringArgs: Iterator[Any] = Iterator(pattern, output) further down. SHOW TABLES
(3-column schema) and SHOW TABLES AS JSON (1-column json_metadata schema)
produce different output schemas; their treeString should differ. The
override exists to keep golden files passing, but that hides a real plan-level
distinction.
Suggestion: Drop the stringArgs overrides and regenerate the affected
explain/golden files. If a specific suite needs the legacy string, scope the
override narrowly to that suite's plan view rather than to the case class.
There was a problem hiding this comment.
stringArgs override was added as a workaround for golden-file CI failures caused by adding asJson: Boolean = false to ShowTables/ShowTablesExtended/ShowTablesCommand (changing their simpleString output).
We plan to fix this by removing asJson from those plan nodes entirely. Since AS JSON queries will create a dedicated command node, there will be no reason for ShowTables/ShowTablesExtended/ShowTablesCommand to carry asJson at all making the stringArgs workaround unnecessary.
Does this approach match your intent?
| } | ||
| } | ||
|
|
||
| private def runAsJson(sparkSession: SparkSession): Seq[Row] = { |
There was a problem hiding this comment.
ShowTablesCommand.runAsJson (V1) and ShowTablesJsonExec (V2)
are two emitters that have already drifted: extended catalog name is
v2SessionCatalog.name() here vs catalog.name() in V2; V1 uses
SessionCatalog.listTables (includes temp views) while V2 has a guarded
listTempViews branch.
Suggestion: collapse V1 + V2 into one ShowTablesJsonCommand built in
AstBuilder.visitShowTables / visitShowTableExtended after resolution,
pattern-matching on the resolved child (V1 session, V2 session, non-session
V2). Drop asJson from ShowTables / ShowTablesExtended /
ShowTablesCommand and the corresponding stringArgs overrides. The two emit
functions become one and the drift class closes.
There was a problem hiding this comment.
Got it. We can create a new ShowTablesJsonCommand modeled on DescribeRelationJsonCommand.
Since ShowTablesJsonCommand must live in sql/core, we override visitShowTables/visitShowTableExtended in SparkSqlAstBuilder (not AstBuilder) to produce it when AS JSON is present. Inside run(), we dispatch on CatalogV2Util.isSessionCatalog calling SessionCatalog.listTables for the session catalog and catalog.listTables for V2 catalogs.
This eliminates both ShowTablesCommand.runAsJson() and ShowTablesJsonExec.
Does this align with what you had in mind?
| // listTableAndViewSummaries(), fetch them separately. For V2SessionCatalog, | ||
| // listTables() already includes local temp views, so we skip this to avoid duplicates. | ||
| // For TableViewCatalog (non-extended path), views come from listTableAndViewSummaries(). | ||
| if (!CatalogV2Util.isSessionCatalog(catalog) && |
There was a problem hiding this comment.
This non-session V2 catalog branch
(!CatalogV2Util.isSessionCatalog(catalog) && (isExtended || !catalog.isInstanceOf[TableViewCatalog])) fetches session temp views from
sparkSession.sessionState.catalog and passes the V2 catalog's namespace as
the db argument. Local temp views live in the session catalog's reserved
temp database; global temp views live under global_temp. There is no
semantic in which a non-session V2 catalog's namespace contains session temp
views. Two potential problems:
- The first match arm passes
namespace.headasdb. Passing a non-session
V2 catalog's namespace as the session catalog's temp DB is a category error. - The fallback arm assigns
db = ""for multi-component namespaces.
SessionCatalog.listTempViewsinterprets""as local temp DB - silent
semantic mismatch.
Suggestion: Remove the branch
| } | ||
| } | ||
|
|
||
| test("SHOW TABLES AS JSON returns single row with json_metadata column") { |
There was a problem hiding this comment.
Seems tests cover happy paths only. The following code paths added by this PR are not exercised -- can you add test coverage for the below?
- Pattern-LIKE exclusion. Both V1 (
tableIdentifierPattern) and V2
(StringUtils.filterPattern) run pattern filtering, but every existing filter
test uses a pattern where all tables match. No test createsfoo1andbar1
and assertsSHOW TABLES IN ns LIKE 'foo*' AS JSONemits onlyfoo1. - Current-namespace
SHOW TABLES AS JSON(noIN). All tests useIN $catalog.ns. The grammar allowsSHOW TABLES AS JSONresolving to
CurrentNamespace(AstBuilder.scala:5798). - V1 vs V2 schema parity. The PR has two emitters but no test asserts
they produce equivalent JSON for an equivalent input. Add a parity test that
runs the same query underuseV1Command=trueanduseV1Command=false, parses
both, and asserts equaltablesarrays modulo ordering. This catches future
drift. - Raw envelope assertion. All assertions parse JSON back and check via
\. Add one literal assertion on the envelope to catch an accidental rename
"tables" -> "results".
cloud-fan
left a comment
There was a problem hiding this comment.
Re-review: 1 addressed, 1 remaining, 1 new. 1 blocking, 1 non-blocking.
The agreed ShowTablesJsonCommand refactor (your 06-01 reply on the drift thread) is not in this HEAD — the diff still has ShowTablesJsonExec + ShowTablesCommand.runAsJson. So the open threads stay blocked on it, and I'm not re-posting duplicates. Confirming the direction since you asked twice whether it aligns:
Design / architecture (2)
- (remaining) Consolidate the two JSON emitters into one resolved command. A single
ShowTablesJsonCommandmodeled onDescribeRelationJsonCommand(pattern-match the resolved child) is the right move. It closes all of: the V1↔V2 drift (#1 / @asl3 r3301780829), thestringArgshide-the-schema hack (@asl3 r3301773915, removed onceasJsonleaves the nodes), and the non-session temp-view branch category error (@asl3 r3301795477) — in one change. - (new) One caveat for the refactor. Your plan says
catalog.listTablesfor V2 catalogs. The current non-extended path deliberately usesTableViewCatalog#listTableAndViewSummaries(ShowTablesJsonExec, mirrored inShowTablesExec) so views appear alongside tables; a flatcatalog.listTableswould drop that. Please keep theTableViewCatalogbranch inShowTablesJsonCommand.
Addressed since last round: duplicate-absence test assertions (tables.length == N, distinct checks) — prior AI review #2. The happy-path-only test gaps (@asl3 r3301799400: pattern-LIKE exclusion, current-namespace SHOW TABLES AS JSON, V1/V2 parity, raw-envelope assertion) are still worth filling once the command lands.
cloud-fan
left a comment
There was a problem hiding this comment.
Re-review: 2 addressed, 0 remaining, 3 new. 2 blocking, 1 non-blocking.
Nice round — the agreed ShowTablesJsonCommand consolidation landed and closed all the open threads: the V1/V2 drift, the stringArgs hack (asJson is off the plan nodes now), the non-session temp-view category error (@asl3 r3301795477), and the prior happy-path-only test gaps (LIKE exclusion, current-namespace, V1/V2 parity, raw envelope). Two blockers below; one is a late catch on my side.
Correctness (2)
ShowTablesJsonCommand.scala:82: JSONtype—METRIC_VIEWreported asTABLE, and theTABLE/VIEWvocabulary is coarser than theDESCRIBE AS JSONsibling and the text output — see inline (late catch: predates this round, my earlier miss).tables.scala:28: import-group blank line removed → scalastyle lint CI failure — see inline.
Suggestions (1)
ShowTablesJsonCommand.scala:54: non-exhaustivechild match(crypticMatchErrorvs the analogue's explicitcase _) — see inline.
| "VIEW" | ||
| } else { | ||
| val meta = sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent) | ||
| if (meta.tableType == CatalogTableType.VIEW) "VIEW" else "TABLE" |
There was a problem hiding this comment.
Two issues with the extended type field, both fixable here:
-
METRIC_VIEWis misclassified asTABLE. This checksmeta.tableType == CatalogTableType.VIEW, butCatalogTable.isViewLikeisVIEW || METRIC_VIEW(interface.scala:784-786). A persistent metric view falls through to"TABLE"— a silent wrong value. Minimal fix: usemeta.isViewLike. -
The
TABLE/VIEWvocabulary is coarser than the rest of the system. The siblingDescribeRelationJsonCommandemits the fulltableType.name(MANAGED/EXTERNAL/VIEW/METRIC_VIEW,interface.scala:670), and the textSHOW TABLE EXTENDEDexposes the same granularType. Since the JSON output is meant to be a stable machine-readable contract, collapsing toTABLE/VIEWdrops theMANAGED-vs-EXTERNALdistinction and diverges from theDESCRIBE AS JSONsibling — and it's expensive to change once the contract ships. Emittingmeta.tableType.namehere would fix the metric-view bug and the inconsistency in one go. Is the coarse vocabulary intentional, or worth aligning withDESCRIBE AS JSON?
(The V2 path at line 135 always emits TABLE; that's defensible since only catalog.listTables rows are listed there and avoiding a per-table loadTable is a reasonable perf choice — it just shares the same vocabulary question.)
This predates this round (the old ShowTablesJsonExec had if (isTemporary) "VIEW" else "TABLE"), so it's a late catch on my side — not something this refactor introduced.
There was a problem hiding this comment.
Good catch. The coarse TABLE/VIEW was carried over from the old ShowTablesJsonExec and is not an intentional design choice. Will switch to meta.tableType.name to align with DESCRIBE AS JSON (which emits MANAGED/EXTERNAL/VIEW/METRIC_VIEW via CatalogTable.toLinkedHashMap).
For the V2 path (line 135), will keep TABLE since resolving the actual type would require a per-table loadTable call
| import org.apache.hadoop.fs.{FileContext, FsConstants, Path} | ||
| import org.apache.hadoop.fs.permission.{AclEntry, AclEntryScope, AclEntryType, FsAction, FsPermission} | ||
|
|
||
| import org.apache.spark.sql.{Row, SparkSession} |
There was a problem hiding this comment.
This drops the blank line separating the 3rd-party import group (org.apache.hadoop.*) from the spark group, which fails scalastyle's ImportOrderChecker:
There should at least one a single empty line separating groups 3rdParty and spark. line=28
so the lint CI gate will be red. This blank-line removal is in fact this file's only change versus master (the runAsJson code that used to be here was never on master), so restoring the blank line reverts the file entirely:
| import org.apache.spark.sql.{Row, SparkSession} | |
| import org.apache.spark.sql.{Row, SparkSession} |
Minor related note: AstBuilder, ResolveSessionCatalog, DataSourceV2Strategy, and ShowTablesExec also carry leftover residue from the asJson refactor (formatting collapses / a behavior-preserving filter+emit split) with no net functional change — harmless, but reverting the purely cosmetic ones would keep the diff minimal.
|
|
||
| override def run(sparkSession: SparkSession): Seq[Row] = { | ||
| val jsonOutput = child match { | ||
| case ResolvedNamespace(catalog, ns, _) => |
There was a problem hiding this comment.
(non-blocking) This match handles only ResolvedNamespace, so any other child would surface as a cryptic scala.MatchError. It's unreachable today — the analyzer always resolves the namespace child to ResolvedNamespace — but the analogue this command is modeled on, DescribeRelationJsonCommand, uses an explicit case _ => throw …. Consider an explicit case _ (internal error) so the failure is clear if that invariant ever changes.
cloud-fan
left a comment
There was a problem hiding this comment.
Re-review: 3 addressed, 0 remaining, 2 new (1 newly introduced, 1 late catch).
1 blocking, 0 non-blocking, 1 nit.
Nice round — all three prior findings are cleanly addressed (granular tableType.name with the expectedTableTypeInJson hook; tables.scala fully reverted out of the diff; explicit case other internalError fallback) and the diff is down to 8 files. One blocking item below — a late catch on my side, not something this round introduced.
Heads-up: the test matrix never ran on this PR. Every
Build/test workflow on this commit (and the previous one) isskipped— the only green check is theUpdate build statusmetadata job, not the test suites. Please trigger a full CI run before merge; the failure below currently has no red signal to surface it.
Correctness (1)
- ShowTablesSuiteBase.scala:554 (and :533): the two local-temp-view tests assert session-temp-view presence against the non-session V2 catalog, which
runForV2Catalog(correctly) omits — so both fail when run by the V2ShowTablesSuite— see inline (late catch: predates this round, my earlier miss).
Nits: 1 minor item (see inline comment on ShowTablesSuiteBase.scala:688).
Verification
Traced the V2-suite routing for finding #1: SHOW TABLES[/TABLE EXTENDED] IN test_catalog.ns AS JSON builds ShowTablesJsonCommand whose child resolves to ResolvedNamespace(test_catalog, [ns]); test_catalog is a non-session InMemoryPartitionTableCatalog, so isSessionCatalog is false and it routes to runForV2Catalog, which only calls catalog.listTables/listTableAndViewSummaries and never SessionCatalog.listTempViews. The session-local temp view therefore never appears in the V2 listing, so :533's assert(tempView.isDefined) and :554's assert(tables.length == 2) fail on the V2 path. The global_temp variants (:572/:598/:623) pass because IN global_temp resolves to the session catalog even in the V2 suite. Confirmed by running command/v2/ShowTablesSuite locally: 12 run, 10 succeeded, 2 failed (exactly these two). Latent because the fork's test matrix never ran on this PR (all runs skipped).
| val jsonStr = df.collect()(0).getString(0) | ||
| val json = parse(jsonStr) | ||
| val tables = (json \ "tables").asInstanceOf[JArray].arr | ||
| assert(tables.length == 2, s"Expected 2 entries (tbl + $localTmpViewName), got: $tables") |
There was a problem hiding this comment.
This test fails when run by the V2 ShowTablesSuite. ShowTablesSuiteBase is shared by both the V1 and V2 concrete suites. Here localTmpViewName is a session-local temp view, but the query is SHOW TABLE EXTENDED IN $catalog.ns ... AS JSON. In the V2 suite $catalog is test_catalog (a non-session InMemoryPartitionTableCatalog), so resolution routes to ShowTablesJsonCommand.runForV2Catalog, which — correctly, per the agreed design (your reply to @asl3 on r3301795477) — does not fetch session temp views. So only tbl is listed and tables.length == 2 fails on the V2 path. (The global_temp variants pass because IN global_temp resolves to the session catalog even in the V2 suite.)
Suggest making session-temp-view presence catalog-aware — e.g. an expectsSessionTempViews override hook mirroring the new expectedTableTypeInJson, true only for the session/V1 path — or scoping this test (and the one at :533) to the V1 suite. Confirmed by running command/v2/ShowTablesSuite locally (12 run, 10 passed, these 2 failed); it's latent rather than red only because the fork's test matrix never ran on this PR (all runs skipped).
| val tables = (json \ "tables").asInstanceOf[JArray].arr | ||
| val names = tables.map(t => (t \ "name").extract[String]) | ||
| assert(names.distinct.length == names.length, s"Duplicate entries found: $names") | ||
| val tempView = tables.find(t => (t \ "name").extract[String] == "tv") |
There was a problem hiding this comment.
Same root cause as the comment on line 554: tv is a session-local temp view, so it is absent from the V2 catalog's listing (runForV2Catalog does not fetch session temp views), and this assert(tempView.isDefined) fails when this test runs in the V2 ShowTablesSuite.
| } | ||
| } | ||
|
|
||
| test("SHOW TABLES AS JSON with useV1Command") { |
There was a problem hiding this comment.
(nit) The body asserts namesDefault === namesV1 — a parity check across useV1Command true/false — but the renamed title no longer conveys that.
| test("SHOW TABLES AS JSON with useV1Command") { | |
| test("SHOW TABLES AS JSON produces same result for useV1Command true and false") { |
|
can you check your github action settings? CI has not been triggered for this PR. |
|
otherwise LGTM |
@cloud-fan CI skip is caused by SPARK-57074 (merged May 27), which added a condition to build_main.yml that skips the Build workflow on fork master pushes. Since this PR's head branch is master on my fork, the push-triggered CI is being skipped. I've verified all tests locally, both new and existing ShowTables* suites pass. Would it be possible to re-trigger CI from the upstream side, or should I close this PR and reopen from a named branch? |
|
please re-create the pr, passing CI is a hard requirement to merge PRs |
|
Reopening from a named branch to fix CI (SPARK-57074 skips Build on fork master pushes). |
…W TABLE EXTENDED
### What changes were proposed in this pull request?
Support SHOW TABLES ... [AS JSON] and SHOW TABLE EXTENDED ... [AS JSON] to optionally display table listing metadata in JSON format.
SQL syntax:
```
SHOW TABLES [(IN|FROM) database_name] [[LIKE] pattern] [AS JSON]
SHOW TABLE EXTENDED [(IN|FROM) database_name] LIKE pattern [AS JSON]
```
Output: json_metadata: String
SHOW TABLES AS JSON:
`{"tables":[{"name":"t1","namespace":["db"],"isTemporary":false}]}`
SHOW TABLE EXTENDED ... AS JSON additionally includes catalog and type:
`{"tables":[{"catalog":"spark_catalog","namespace":["db"],"name":"t1","type":"TABLE","isTemporary":false}]}`
### Why are the changes needed?
The existing text-based output of SHOW TABLES requires fragile string parsing for programmatic consumption.
A structured JSON format provides a stable, machine-readable contract for tooling and automation.
### Does this PR introduce _any_ user-facing change?
Yes. Two new SQL syntax variants that return JSON output. Existing commands without AS JSON are unaffected.
SHOW TABLE EXTENDED with both PARTITION and AS JSON is explicitly rejected.
### How was this patch tested?
- Parser tests in `ShowTablesParserSuite` for all AS JSON variants and the PARTITION + AS JSON error case.
- Execution tests in `ShowTablesSuiteBase` covering JSON schema validation, empty databases, EXTENDED output, and temp view inclusion.
- Manual verification in spark-shell
### Was this patch authored or co-authored using generative AI tooling?
No
---
Continuation of #[54824](#54824) (closed to move off fork `master` branch for CI)
Closes #56414 from ayushbilala/show-tables-json.
Authored-by: Ayush <bilalaayush@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…W TABLE EXTENDED
### What changes were proposed in this pull request?
Support SHOW TABLES ... [AS JSON] and SHOW TABLE EXTENDED ... [AS JSON] to optionally display table listing metadata in JSON format.
SQL syntax:
```
SHOW TABLES [(IN|FROM) database_name] [[LIKE] pattern] [AS JSON]
SHOW TABLE EXTENDED [(IN|FROM) database_name] LIKE pattern [AS JSON]
```
Output: json_metadata: String
SHOW TABLES AS JSON:
`{"tables":[{"name":"t1","namespace":["db"],"isTemporary":false}]}`
SHOW TABLE EXTENDED ... AS JSON additionally includes catalog and type:
`{"tables":[{"catalog":"spark_catalog","namespace":["db"],"name":"t1","type":"TABLE","isTemporary":false}]}`
### Why are the changes needed?
The existing text-based output of SHOW TABLES requires fragile string parsing for programmatic consumption.
A structured JSON format provides a stable, machine-readable contract for tooling and automation.
### Does this PR introduce _any_ user-facing change?
Yes. Two new SQL syntax variants that return JSON output. Existing commands without AS JSON are unaffected.
SHOW TABLE EXTENDED with both PARTITION and AS JSON is explicitly rejected.
### How was this patch tested?
- Parser tests in `ShowTablesParserSuite` for all AS JSON variants and the PARTITION + AS JSON error case.
- Execution tests in `ShowTablesSuiteBase` covering JSON schema validation, empty databases, EXTENDED output, and temp view inclusion.
- Manual verification in spark-shell
### Was this patch authored or co-authored using generative AI tooling?
No
---
Continuation of #[54824](#54824) (closed to move off fork `master` branch for CI)
Closes #56414 from ayushbilala/show-tables-json.
Authored-by: Ayush <bilalaayush@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 79dcac9)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Support SHOW TABLES ... [AS JSON] and SHOW TABLE EXTENDED ... [AS JSON] to optionally display table listing metadata in JSON format.
SQL syntax:
Output: json_metadata: String
SHOW TABLES AS JSON:
{"tables":[{"name":"t1","namespace":["db"],"isTemporary":false}]}SHOW TABLE EXTENDED ... AS JSON additionally includes catalog and type:
{"tables":[{"catalog":"spark_catalog","namespace":["db"],"name":"t1","type":"TABLE","isTemporary":false}]}Why are the changes needed?
The existing text-based output of SHOW TABLES requires fragile string parsing for programmatic consumption.
A structured JSON format provides a stable, machine-readable contract for tooling and automation.
Does this PR introduce any user-facing change?
Yes. Two new SQL syntax variants that return JSON output. Existing commands without AS JSON are unaffected.
SHOW TABLE EXTENDED with both PARTITION and AS JSON is explicitly rejected.
How was this patch tested?
ShowTablesParserSuitefor all AS JSON variants and the PARTITION + AS JSON error case.ShowTablesSuiteBasecovering JSON schema validation, empty databases, EXTENDED output, and temp view inclusion.Was this patch authored or co-authored using generative AI tooling?
No