Skip to content

[SPARK-53840][SQL] Add AS JSON output support for SHOW TABLES and SHOW TABLE EXTENDED#54824

Closed
ayushbilala wants to merge 4 commits into
apache:masterfrom
ayushbilala:master
Closed

[SPARK-53840][SQL] Add AS JSON output support for SHOW TABLES and SHOW TABLE EXTENDED#54824
ayushbilala wants to merge 4 commits into
apache:masterfrom
ayushbilala:master

Conversation

@ayushbilala

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Support SHOW TABLES ... [AS JSON] and SHOW TABLE EXTENDED ... [AS JSON] to optionally display table listing metadata in JSON format.
SQL syntax:

SHOW TABLES [(IN|FROM) database_name] [[LIKE] pattern] [AS JSON]
SHOW TABLE EXTENDED [(IN|FROM) database_name] LIKE pattern [AS JSON]

Output: json_metadata: String

SHOW TABLES AS JSON:
{"tables":[{"name":"t1","namespace":["db"],"isTemporary":false}]}

SHOW TABLE EXTENDED ... AS JSON additionally includes catalog and type:
{"tables":[{"catalog":"spark_catalog","namespace":["db"],"name":"t1","type":"TABLE","isTemporary":false}]}

Why are the changes needed?

The existing text-based output of SHOW TABLES requires fragile string parsing for programmatic consumption.
A structured JSON format provides a stable, machine-readable contract for tooling and automation.

Does this PR introduce any user-facing change?

Yes. Two new SQL syntax variants that return JSON output. Existing commands without AS JSON are unaffected.
SHOW TABLE EXTENDED with both PARTITION and AS JSON is explicitly rejected.

How was this patch tested?

  • Parser tests in ShowTablesParserSuite for all AS JSON variants and the PARTITION + AS JSON error case.
  • Execution tests in ShowTablesSuiteBase covering JSON schema validation, empty databases, EXTENDED output, and temp view inclusion.
  • Manual verification in spark-shell

Was this patch authored or co-authored using generative AI tooling?

No

@ayushbilala ayushbilala force-pushed the master branch 3 times, most recently from 91aeb50 to 625fbe2 Compare March 18, 2026 05:33
@ayushbilala

Copy link
Copy Markdown
Contributor Author

@asl3 Have you had the opportunity to review this pull request yet?

@asl3 asl3 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Overall looks good to me.

Can you also add test coverage for temp views (uses a separate listTempViews call)?

@ayushbilala

Copy link
Copy Markdown
Contributor Author

@asl3 I have added tests for temp views. Can you please check?

@cloud-fan cloud-fan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR adds AS JSON output mode to SHOW TABLES and SHOW TABLE EXTENDED, threading an asJson: Boolean flag through the parser, logical plan nodes (ShowTables, ShowTablesExtended), resolution (ResolveSessionCatalog, DataSourceV2Strategy), and execution (ShowTablesCommand for V1, ShowTablesExec/ShowTablesExtendedExec for V2). When asJson=true, a single-row result with a json_metadata column is returned. PARTITION + AS JSON is rejected at parse time.

General comments

  • The JSON generation logic is duplicated across three execution classes (ShowTablesCommand, ShowTablesExec, ShowTablesExtendedExec), and findings #1 and #2 below are direct consequences of that duplication — different listing strategies for temp views in V2, different ways to obtain the catalog name in V1. The existing DESCRIBE TABLE ... AS JSON avoids this entirely by using a separate command class (DescribeRelationJsonCommand) — a single UnaryRunnableCommand that pattern-matches on the already-resolved child and handles all catalog types in one place, with no need for separate V1 and V2 implementations. Consider following the same pattern: a single ShowTablesJsonCommand that receives the resolved namespace, gets the catalog from it, and centralizes JSON generation. This eliminates the V1/V2 divergence that caused both bugs.

  • No test checks for the absence of duplicate entries in JSON output — all tests use .find() which returns the first match. Adding an assertion on tables.length (expected count) would catch duplicates.

}.toList

val jsonTempViews = if (CatalogV2Util.isSessionCatalog(catalog)) {
val sessionCatalog = session.sessionState.catalog

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catalog.listTables() via V2SessionCatalog calls SessionCatalog.listTables(db, "*", includeLocalTempViews=true) which already includes local temp views. This separate listTempViews() call adds them a second time, causing duplicate entries in the JSON output when the V2 session catalog is used (when useV1Command=false).

The non-JSON path at line 78 does not list temp views separately — it correctly relies on what catalog.listTables() returns.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I will remove the redundant sessionCatalog.listTempViews(...) block entirely.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. JSON path has been moved to the new ShowTablesJsonExec which guards the separate listTempViews() call with !CatalogV2Util.isSessionCatalog(catalog), matching the logic of the non-JSON path.

When catalog is V2SessionCatalog, temp views are already included in listTables() results and the separate call is skipped.

Comment on lines +976 to +977
"catalog" -> JString(
sparkSession.sessionState.catalogManager.currentCatalog.name()),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currentCatalog returns the user's currently active catalog (set via USE), but ShowTablesCommand always operates on the session catalog (matched via ResolvedV1Database). If the user runs e.g. USE my_v2_catalog; SHOW TABLE EXTENDED IN spark_catalog.mydb LIKE '*' AS JSON, this reports "catalog": "my_v2_catalog" instead of "spark_catalog". The V2 exec correctly uses catalog.name() instead.

Suggested change
"catalog" -> JString(
sparkSession.sessionState.catalogManager.currentCatalog.name()),
"catalog" -> JString(
sparkSession.sessionState.catalogManager.v2SessionCatalog.name()),

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out the scoping distinction with USE. I'll update the implementation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - now uses sparkSession.sessionState.catalogManager.v2SessionCatalog.name() as suggested. This correctly returns spark_catalog regardless of which catalog the user has set active via USE.

"DESC TABLE COLUMN for a specific partition."
]
},
"SHOW_TABLE_EXTENDED_JSON_WITH_PARTITION" : {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UNSUPPORTED_FEATURE sub-classes are alphabetically ordered. SHOW_TABLE_EXTENDED_JSON_WITH_PARTITION (S-H) is placed between DESC_TABLE_COLUMN_PARTITION (D-E) and DROP_DATABASE (D-R), but it should be placed between SET_VARIABLE_USING_SET and SQL_CURSOR to maintain alphabetical order.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - SHOW_TABLE_EXTENDED_JSON_WITH_PARTITION is now placed between SET_VARIABLE_USING_SET and SQL_CURSOR to maintain alphabetical order.

@ayushbilala

Copy link
Copy Markdown
Contributor Author

Addressed all review comments. Please take another pass when you get a chance. Thanks!

case Seq(db) => Some(db)
case _ => None
}
val tempViews = sessionCatalog.listTempViews(db.getOrElse(""), pattern)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For V2SessionCatalog, catalog.listTables(namespace.toArray) already includes local temp views (and globals when ns is global_temp) via SessionCatalog.listTables(db, "*", includeLocalTempViews=true).

This block lists them again, so each temp view is emitted twice: once with type=TABLE, isTemporary=false, once with type=VIEW, isTemporary=true. The same duplicate path was removed from ShowTablesExec.

Should either skip this listTempViews block (matching ShowTablesExec) and use an isTempView-style helper above, or filter temp views out of tables before emission.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. JSON path has been extracted to ShowTablesJsonExec. There is a guard !CatalogV2Util.isSessionCatalog(catalog) to prevent the duplicate listTempViews() fetch when catalog is V2SessionCatalog.

Remaining code in ShowTablesExtendedExec is the text-only path, which is unchanged.

val tables = catalog.listTables(namespace.toArray)
tables.foreach { tableIdent =>
if (StringUtils.filterPattern(Seq(tableIdent.name()), pattern).nonEmpty) {
// V2 persistent tables are always TABLE

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment "V2 persistent tables are always TABLE" and unconditional type=TABLE, isTemporary=false is wrong when catalog is V2SessionCatalog. Can use a per-row isTempView check (example: ShowTablesExec.isTempView)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. In ShowTablesJsonExec, each identifier from listTables() / listTableAndViewSummaries() is now passed through isTempView(ident), which checks SessionCatalog.isTempView when catalog is V2SessionCatalog.

Temp views returned by listTables() are correctly identified and emitted with type=VIEW, isTemporary=true.

Try(catalog.getTempViewOrPermanentTableMetadata(tableIdent)) match {
case Success(meta) if meta.tableType == CatalogTableType.VIEW => "VIEW"
case Success(_) => "TABLE"
case Failure(_) => "UNKNOWN"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failure(_) => "UNKNOWN" swallows any exception from getTempViewOrPermanentTableMetadata. The text path lets exceptions propagate - consider following the same here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - Try/catch block that returned "UNKNOWN" on failure has been removed. getTempViewOrPermanentTableMetadata now propagates exceptions directly, consistent with the text path.

val json = parse(jsonStr)
val tables = (json \ "tables").asInstanceOf[JArray].arr

assert(tables.length == 2)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test seems failing in CI?

}
}

test("SHOW TABLE EXTENDED AS JSON with both local and global temp views") {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems @cloud-fan 's comment about .find() masking duplicates still needs to be addressed here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - added assert(tables.length == N) to the tests that used .find() without a count assertion. This will catch silent duplicates.

case class ShowTables(
namespace: LogicalPlan,
pattern: Option[String],
asJson: Boolean = false,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently have duplicated JSON emission code across Exec classes. Consider a ShowTablesJsonCommand which takes the resolved namespace/catalog and handles all variants centrally, similar to DescribeRelationJsonCommand

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. We extracted a single ShowTablesJsonExec physical plan node covering both SHOW TABLES AS JSON and SHOW TABLE EXTENDED AS JSON, controlled by an isExtended: Boolean flag.

if (asJson) {
val jsonTables = filteredTables.map { table =>
JObject(
"name" -> JString(table.name()),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

field order should be consistent between non-extended {name, namespace, isTemporary} and extended {catalog, namespace, name, type, isTemporary}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. JSON code is consolidated in ShowTablesJsonExec.toJsonEntry(). Both schemas now have name as the first field: non-extended is {name, namespace, isTemporary} and extended is {name, catalog, namespace, type, isTemporary}.

V1 path in ShowTablesCommand follows the same ordering.

@ayushbilala ayushbilala force-pushed the master branch 2 times, most recently from e6c7b33 to 23c5cbc Compare May 8, 2026 11:53
@ayushbilala

Copy link
Copy Markdown
Contributor Author

@cloud-fan Both addressed.

For (a): we extracted a single ShowTablesJsonExec physical plan node that handles both SHOW TABLES AS JSON (isExtended=false) and SHOW TABLE EXTENDED AS JSON (isExtended=true).

DataSourceV2Strategy now routes asJson=true to it for both variants, eliminating the duplicated JSON emission from ShowTablesExec and ShowTablesExtendedExec. V1 path in ShowTablesCommand remains separate since it operates on a different plan type (ResolvedV1Database).

For (b): added assert(tables.length == N) to the tests that used .find() without a count check.

override def child: LogicalPlan = namespace
override protected def withNewChildInternal(newChild: LogicalPlan): ShowTables =
copy(namespace = newChild)
override protected def stringArgs: Iterator[Any] = Iterator(pattern, output)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asJson: Boolean = false field added to ShowTables (and
the parallel field on ShowTablesExtended and ShowTablesCommand) is then
immediately hidden from treeString by an override protected def stringArgs: Iterator[Any] = Iterator(pattern, output) further down. SHOW TABLES
(3-column schema) and SHOW TABLES AS JSON (1-column json_metadata schema)
produce different output schemas; their treeString should differ. The
override exists to keep golden files passing, but that hides a real plan-level
distinction.

Suggestion: Drop the stringArgs overrides and regenerate the affected
explain/golden files. If a specific suite needs the legacy string, scope the
override narrowly to that suite's plan view rather than to the case class.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stringArgs override was added as a workaround for golden-file CI failures caused by adding asJson: Boolean = false to ShowTables/ShowTablesExtended/ShowTablesCommand (changing their simpleString output).

We plan to fix this by removing asJson from those plan nodes entirely. Since AS JSON queries will create a dedicated command node, there will be no reason for ShowTables/ShowTablesExtended/ShowTablesCommand to carry asJson at all making the stringArgs workaround unnecessary.

Does this approach match your intent?

}
}

private def runAsJson(sparkSession: SparkSession): Seq[Row] = {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShowTablesCommand.runAsJson (V1) and ShowTablesJsonExec (V2)
are two emitters that have already drifted: extended catalog name is
v2SessionCatalog.name() here vs catalog.name() in V2; V1 uses
SessionCatalog.listTables (includes temp views) while V2 has a guarded
listTempViews branch.

Suggestion: collapse V1 + V2 into one ShowTablesJsonCommand built in
AstBuilder.visitShowTables / visitShowTableExtended after resolution,
pattern-matching on the resolved child (V1 session, V2 session, non-session
V2). Drop asJson from ShowTables / ShowTablesExtended /
ShowTablesCommand and the corresponding stringArgs overrides. The two emit
functions become one and the drift class closes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. We can create a new ShowTablesJsonCommand modeled on DescribeRelationJsonCommand.

Since ShowTablesJsonCommand must live in sql/core, we override visitShowTables/visitShowTableExtended in SparkSqlAstBuilder (not AstBuilder) to produce it when AS JSON is present. Inside run(), we dispatch on CatalogV2Util.isSessionCatalog calling SessionCatalog.listTables for the session catalog and catalog.listTables for V2 catalogs.

This eliminates both ShowTablesCommand.runAsJson() and ShowTablesJsonExec.

Does this align with what you had in mind?

// listTableAndViewSummaries(), fetch them separately. For V2SessionCatalog,
// listTables() already includes local temp views, so we skip this to avoid duplicates.
// For TableViewCatalog (non-extended path), views come from listTableAndViewSummaries().
if (!CatalogV2Util.isSessionCatalog(catalog) &&

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This non-session V2 catalog branch
(!CatalogV2Util.isSessionCatalog(catalog) && (isExtended || !catalog.isInstanceOf[TableViewCatalog])) fetches session temp views from
sparkSession.sessionState.catalog and passes the V2 catalog's namespace as
the db argument. Local temp views live in the session catalog's reserved
temp database; global temp views live under global_temp. There is no
semantic in which a non-session V2 catalog's namespace contains session temp
views. Two potential problems:

  1. The first match arm passes namespace.head as db. Passing a non-session
    V2 catalog's namespace as the session catalog's temp DB is a category error.
  2. The fallback arm assigns db = "" for multi-component namespaces.
    SessionCatalog.listTempViews interprets "" as local temp DB - silent
    semantic mismatch.

Suggestion: Remove the branch

}
}

test("SHOW TABLES AS JSON returns single row with json_metadata column") {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems tests cover happy paths only. The following code paths added by this PR are not exercised -- can you add test coverage for the below?

  1. Pattern-LIKE exclusion. Both V1 (tableIdentifierPattern) and V2
    (StringUtils.filterPattern) run pattern filtering, but every existing filter
    test uses a pattern where all tables match. No test creates foo1 and bar1
    and asserts SHOW TABLES IN ns LIKE 'foo*' AS JSON emits only foo1.
  2. Current-namespace SHOW TABLES AS JSON (no IN). All tests use IN $catalog.ns. The grammar allows SHOW TABLES AS JSON resolving to
    CurrentNamespace (AstBuilder.scala:5798).
  3. V1 vs V2 schema parity. The PR has two emitters but no test asserts
    they produce equivalent JSON for an equivalent input. Add a parity test that
    runs the same query under useV1Command=true and useV1Command=false, parses
    both, and asserts equal tables arrays modulo ordering. This catches future
    drift.
  4. Raw envelope assertion. All assertions parse JSON back and check via
    \. Add one literal assertion on the envelope to catch an accidental rename
    "tables" -> "results".

@cloud-fan cloud-fan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: 1 addressed, 1 remaining, 1 new. 1 blocking, 1 non-blocking.

The agreed ShowTablesJsonCommand refactor (your 06-01 reply on the drift thread) is not in this HEAD — the diff still has ShowTablesJsonExec + ShowTablesCommand.runAsJson. So the open threads stay blocked on it, and I'm not re-posting duplicates. Confirming the direction since you asked twice whether it aligns:

Design / architecture (2)

  • (remaining) Consolidate the two JSON emitters into one resolved command. A single ShowTablesJsonCommand modeled on DescribeRelationJsonCommand (pattern-match the resolved child) is the right move. It closes all of: the V1↔V2 drift (#1 / @asl3 r3301780829), the stringArgs hide-the-schema hack (@asl3 r3301773915, removed once asJson leaves the nodes), and the non-session temp-view branch category error (@asl3 r3301795477) — in one change.
  • (new) One caveat for the refactor. Your plan says catalog.listTables for V2 catalogs. The current non-extended path deliberately uses TableViewCatalog#listTableAndViewSummaries (ShowTablesJsonExec, mirrored in ShowTablesExec) so views appear alongside tables; a flat catalog.listTables would drop that. Please keep the TableViewCatalog branch in ShowTablesJsonCommand.

Addressed since last round: duplicate-absence test assertions (tables.length == N, distinct checks) — prior AI review #2. The happy-path-only test gaps (@asl3 r3301799400: pattern-LIKE exclusion, current-namespace SHOW TABLES AS JSON, V1/V2 parity, raw-envelope assertion) are still worth filling once the command lands.

@cloud-fan cloud-fan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: 2 addressed, 0 remaining, 3 new. 2 blocking, 1 non-blocking.

Nice round — the agreed ShowTablesJsonCommand consolidation landed and closed all the open threads: the V1/V2 drift, the stringArgs hack (asJson is off the plan nodes now), the non-session temp-view category error (@asl3 r3301795477), and the prior happy-path-only test gaps (LIKE exclusion, current-namespace, V1/V2 parity, raw envelope). Two blockers below; one is a late catch on my side.

Correctness (2)

  • ShowTablesJsonCommand.scala:82: JSON typeMETRIC_VIEW reported as TABLE, and the TABLE/VIEW vocabulary is coarser than the DESCRIBE AS JSON sibling and the text output — see inline (late catch: predates this round, my earlier miss).
  • tables.scala:28: import-group blank line removed → scalastyle lint CI failure — see inline.

Suggestions (1)

  • ShowTablesJsonCommand.scala:54: non-exhaustive child match (cryptic MatchError vs the analogue's explicit case _) — see inline.

"VIEW"
} else {
val meta = sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent)
if (meta.tableType == CatalogTableType.VIEW) "VIEW" else "TABLE"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two issues with the extended type field, both fixable here:

  1. METRIC_VIEW is misclassified as TABLE. This checks meta.tableType == CatalogTableType.VIEW, but CatalogTable.isViewLike is VIEW || METRIC_VIEW (interface.scala:784-786). A persistent metric view falls through to "TABLE" — a silent wrong value. Minimal fix: use meta.isViewLike.

  2. The TABLE/VIEW vocabulary is coarser than the rest of the system. The sibling DescribeRelationJsonCommand emits the full tableType.name (MANAGED/EXTERNAL/VIEW/METRIC_VIEW, interface.scala:670), and the text SHOW TABLE EXTENDED exposes the same granular Type. Since the JSON output is meant to be a stable machine-readable contract, collapsing to TABLE/VIEW drops the MANAGED-vs-EXTERNAL distinction and diverges from the DESCRIBE AS JSON sibling — and it's expensive to change once the contract ships. Emitting meta.tableType.name here would fix the metric-view bug and the inconsistency in one go. Is the coarse vocabulary intentional, or worth aligning with DESCRIBE AS JSON?

(The V2 path at line 135 always emits TABLE; that's defensible since only catalog.listTables rows are listed there and avoiding a per-table loadTable is a reasonable perf choice — it just shares the same vocabulary question.)

This predates this round (the old ShowTablesJsonExec had if (isTemporary) "VIEW" else "TABLE"), so it's a late catch on my side — not something this refactor introduced.

@ayushbilala ayushbilala Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The coarse TABLE/VIEW was carried over from the old ShowTablesJsonExec and is not an intentional design choice. Will switch to meta.tableType.name to align with DESCRIBE AS JSON (which emits MANAGED/EXTERNAL/VIEW/METRIC_VIEW via CatalogTable.toLinkedHashMap).

For the V2 path (line 135), will keep TABLE since resolving the actual type would require a per-table loadTable call

import org.apache.hadoop.fs.{FileContext, FsConstants, Path}
import org.apache.hadoop.fs.permission.{AclEntry, AclEntryScope, AclEntryType, FsAction, FsPermission}

import org.apache.spark.sql.{Row, SparkSession}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This drops the blank line separating the 3rd-party import group (org.apache.hadoop.*) from the spark group, which fails scalastyle's ImportOrderChecker:

There should at least one a single empty line separating groups 3rdParty and spark. line=28

so the lint CI gate will be red. This blank-line removal is in fact this file's only change versus master (the runAsJson code that used to be here was never on master), so restoring the blank line reverts the file entirely:

Suggested change
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.{Row, SparkSession}

Minor related note: AstBuilder, ResolveSessionCatalog, DataSourceV2Strategy, and ShowTablesExec also carry leftover residue from the asJson refactor (formatting collapses / a behavior-preserving filter+emit split) with no net functional change — harmless, but reverting the purely cosmetic ones would keep the diff minimal.


override def run(sparkSession: SparkSession): Seq[Row] = {
val jsonOutput = child match {
case ResolvedNamespace(catalog, ns, _) =>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) This match handles only ResolvedNamespace, so any other child would surface as a cryptic scala.MatchError. It's unreachable today — the analyzer always resolves the namespace child to ResolvedNamespace — but the analogue this command is modeled on, DescribeRelationJsonCommand, uses an explicit case _ => throw …. Consider an explicit case _ (internal error) so the failure is clear if that invariant ever changes.

@cloud-fan cloud-fan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: 3 addressed, 0 remaining, 2 new (1 newly introduced, 1 late catch).

1 blocking, 0 non-blocking, 1 nit.

Nice round — all three prior findings are cleanly addressed (granular tableType.name with the expectedTableTypeInJson hook; tables.scala fully reverted out of the diff; explicit case other internalError fallback) and the diff is down to 8 files. One blocking item below — a late catch on my side, not something this round introduced.

Heads-up: the test matrix never ran on this PR. Every Build/test workflow on this commit (and the previous one) is skipped — the only green check is the Update build status metadata job, not the test suites. Please trigger a full CI run before merge; the failure below currently has no red signal to surface it.

Correctness (1)

  • ShowTablesSuiteBase.scala:554 (and :533): the two local-temp-view tests assert session-temp-view presence against the non-session V2 catalog, which runForV2Catalog (correctly) omits — so both fail when run by the V2 ShowTablesSuite — see inline (late catch: predates this round, my earlier miss).

Nits: 1 minor item (see inline comment on ShowTablesSuiteBase.scala:688).

Verification

Traced the V2-suite routing for finding #1: SHOW TABLES[/TABLE EXTENDED] IN test_catalog.ns AS JSON builds ShowTablesJsonCommand whose child resolves to ResolvedNamespace(test_catalog, [ns]); test_catalog is a non-session InMemoryPartitionTableCatalog, so isSessionCatalog is false and it routes to runForV2Catalog, which only calls catalog.listTables/listTableAndViewSummaries and never SessionCatalog.listTempViews. The session-local temp view therefore never appears in the V2 listing, so :533's assert(tempView.isDefined) and :554's assert(tables.length == 2) fail on the V2 path. The global_temp variants (:572/:598/:623) pass because IN global_temp resolves to the session catalog even in the V2 suite. Confirmed by running command/v2/ShowTablesSuite locally: 12 run, 10 succeeded, 2 failed (exactly these two). Latent because the fork's test matrix never ran on this PR (all runs skipped).

val jsonStr = df.collect()(0).getString(0)
val json = parse(jsonStr)
val tables = (json \ "tables").asInstanceOf[JArray].arr
assert(tables.length == 2, s"Expected 2 entries (tbl + $localTmpViewName), got: $tables")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails when run by the V2 ShowTablesSuite. ShowTablesSuiteBase is shared by both the V1 and V2 concrete suites. Here localTmpViewName is a session-local temp view, but the query is SHOW TABLE EXTENDED IN $catalog.ns ... AS JSON. In the V2 suite $catalog is test_catalog (a non-session InMemoryPartitionTableCatalog), so resolution routes to ShowTablesJsonCommand.runForV2Catalog, which — correctly, per the agreed design (your reply to @asl3 on r3301795477) — does not fetch session temp views. So only tbl is listed and tables.length == 2 fails on the V2 path. (The global_temp variants pass because IN global_temp resolves to the session catalog even in the V2 suite.)

Suggest making session-temp-view presence catalog-aware — e.g. an expectsSessionTempViews override hook mirroring the new expectedTableTypeInJson, true only for the session/V1 path — or scoping this test (and the one at :533) to the V1 suite. Confirmed by running command/v2/ShowTablesSuite locally (12 run, 10 passed, these 2 failed); it's latent rather than red only because the fork's test matrix never ran on this PR (all runs skipped).

val tables = (json \ "tables").asInstanceOf[JArray].arr
val names = tables.map(t => (t \ "name").extract[String])
assert(names.distinct.length == names.length, s"Duplicate entries found: $names")
val tempView = tables.find(t => (t \ "name").extract[String] == "tv")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same root cause as the comment on line 554: tv is a session-local temp view, so it is absent from the V2 catalog's listing (runForV2Catalog does not fetch session temp views), and this assert(tempView.isDefined) fails when this test runs in the V2 ShowTablesSuite.

}
}

test("SHOW TABLES AS JSON with useV1Command") {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) The body asserts namesDefault === namesV1 — a parity check across useV1Command true/false — but the renamed title no longer conveys that.

Suggested change
test("SHOW TABLES AS JSON with useV1Command") {
test("SHOW TABLES AS JSON produces same result for useV1Command true and false") {

@cloud-fan

Copy link
Copy Markdown
Contributor

can you check your github action settings? CI has not been triggered for this PR.

@cloud-fan

Copy link
Copy Markdown
Contributor

otherwise LGTM

@ayushbilala

Copy link
Copy Markdown
Contributor Author

can you check your github action settings? CI has not been triggered for this PR.

@cloud-fan CI skip is caused by SPARK-57074 (merged May 27), which added a condition to build_main.yml that skips the Build workflow on fork master pushes. Since this PR's head branch is master on my fork, the push-triggered CI is being skipped.

I've verified all tests locally, both new and existing ShowTables* suites pass. Would it be possible to re-trigger CI from the upstream side, or should I close this PR and reopen from a named branch?

@cloud-fan

Copy link
Copy Markdown
Contributor

please re-create the pr, passing CI is a hard requirement to merge PRs

@ayushbilala

ayushbilala commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Reopening from a named branch to fix CI (SPARK-57074 skips Build on fork master pushes).
Continued in #56414.

@ayushbilala ayushbilala closed this Jun 9, 2026
cloud-fan pushed a commit that referenced this pull request Jun 11, 2026
…W TABLE EXTENDED

### What changes were proposed in this pull request?
Support SHOW TABLES ... [AS JSON] and SHOW TABLE EXTENDED ... [AS JSON] to optionally display table listing metadata in JSON format.
SQL syntax:
```
SHOW TABLES [(IN|FROM) database_name] [[LIKE] pattern] [AS JSON]
SHOW TABLE EXTENDED [(IN|FROM) database_name] LIKE pattern [AS JSON]
```
Output: json_metadata: String

SHOW TABLES AS JSON:
`{"tables":[{"name":"t1","namespace":["db"],"isTemporary":false}]}`

SHOW TABLE EXTENDED ... AS JSON additionally includes catalog and type:
`{"tables":[{"catalog":"spark_catalog","namespace":["db"],"name":"t1","type":"TABLE","isTemporary":false}]}`

### Why are the changes needed?
The existing text-based output of SHOW TABLES requires fragile string parsing for programmatic consumption.
A structured JSON format provides a stable, machine-readable contract for tooling and automation.

### Does this PR introduce _any_ user-facing change?
Yes. Two new SQL syntax variants that return JSON output. Existing commands without AS JSON are unaffected.
SHOW TABLE EXTENDED with both PARTITION and AS JSON is explicitly rejected.

### How was this patch tested?
- Parser tests in `ShowTablesParserSuite` for all AS JSON variants and the PARTITION + AS JSON error case.
- Execution tests in `ShowTablesSuiteBase` covering JSON schema validation, empty databases, EXTENDED output, and temp view inclusion.
- Manual verification in spark-shell

### Was this patch authored or co-authored using generative AI tooling?
No

---
Continuation of #[54824](#54824) (closed to move off fork `master` branch for CI)

Closes #56414 from ayushbilala/show-tables-json.

Authored-by: Ayush <bilalaayush@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Jun 11, 2026
…W TABLE EXTENDED

### What changes were proposed in this pull request?
Support SHOW TABLES ... [AS JSON] and SHOW TABLE EXTENDED ... [AS JSON] to optionally display table listing metadata in JSON format.
SQL syntax:
```
SHOW TABLES [(IN|FROM) database_name] [[LIKE] pattern] [AS JSON]
SHOW TABLE EXTENDED [(IN|FROM) database_name] LIKE pattern [AS JSON]
```
Output: json_metadata: String

SHOW TABLES AS JSON:
`{"tables":[{"name":"t1","namespace":["db"],"isTemporary":false}]}`

SHOW TABLE EXTENDED ... AS JSON additionally includes catalog and type:
`{"tables":[{"catalog":"spark_catalog","namespace":["db"],"name":"t1","type":"TABLE","isTemporary":false}]}`

### Why are the changes needed?
The existing text-based output of SHOW TABLES requires fragile string parsing for programmatic consumption.
A structured JSON format provides a stable, machine-readable contract for tooling and automation.

### Does this PR introduce _any_ user-facing change?
Yes. Two new SQL syntax variants that return JSON output. Existing commands without AS JSON are unaffected.
SHOW TABLE EXTENDED with both PARTITION and AS JSON is explicitly rejected.

### How was this patch tested?
- Parser tests in `ShowTablesParserSuite` for all AS JSON variants and the PARTITION + AS JSON error case.
- Execution tests in `ShowTablesSuiteBase` covering JSON schema validation, empty databases, EXTENDED output, and temp view inclusion.
- Manual verification in spark-shell

### Was this patch authored or co-authored using generative AI tooling?
No

---
Continuation of #[54824](#54824) (closed to move off fork `master` branch for CI)

Closes #56414 from ayushbilala/show-tables-json.

Authored-by: Ayush <bilalaayush@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 79dcac9)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants