Skip to content

[SPARK-39799][SQL] DataSourceV2: View catalog interface#37556

Closed
jzhuge wants to merge 1 commit into
apache:masterfrom
jzhuge:SPARK-39799
Closed

[SPARK-39799][SQL] DataSourceV2: View catalog interface#37556
jzhuge wants to merge 1 commit into
apache:masterfrom
jzhuge:SPARK-39799

Conversation

@jzhuge

@jzhuge jzhuge commented Aug 17, 2022

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

ViewCatalog API described in SPIP.

Why are the changes needed?

First step towards DataSourceV2 view support.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

N/A

@github-actions github-actions Bot added the SQL label Aug 17, 2022
@holdenk

holdenk commented Aug 18, 2022

Copy link
Copy Markdown
Contributor

cc @dongjoon-hyun who had comments on the SPIP

@dongjoon-hyun

Copy link
Copy Markdown
Member

Thank you for pinging me, @holdenk .

cc @viirya , @sunchao , @huaxingao , @aokolnychyi , @RussellSpitzer

@holdenk holdenk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a good first PR for the view catalog interface to me, I think it would be useful to add a link to a working implementation of this (I think having API and impl come in seperately is fine but good to have a known impl of the API existing to know that we've got a working version of it present).

Comment thread sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java Outdated
@jzhuge

jzhuge commented Aug 18, 2022

Copy link
Copy Markdown
Member Author

Find this message strange from https://github.com/jzhuge/spark/runs/7886846067?check_suite_focus=true:

[info] Found the following changed modules: catalyst, hive-thriftserver

This PR does not change module hive-thriftserver.

@holdenk

holdenk commented Aug 22, 2022

Copy link
Copy Markdown
Contributor

Interesting, the failure in CliSuite seems like a red-herring (e.g. that text is present), maybe try a re-base + re-run?

@holdenk

holdenk commented Nov 3, 2022

Copy link
Copy Markdown
Contributor

pending CI and any other concerns from others I plan to merge this on Friday.

@jzhuge

jzhuge commented Nov 3, 2022

Copy link
Copy Markdown
Member Author

Looking at error:

SparkThrowableSuite.Error classes are correctly formatted

Comment thread sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java Outdated
@jzhuge jzhuge force-pushed the SPARK-39799 branch 2 times, most recently from d325925 to cd0c2a2 Compare November 4, 2022 23:38
@jzhuge

jzhuge commented Nov 5, 2022

Copy link
Copy Markdown
Member Author

Puzzled by this pyspark test failures. Seems unrelated.

[info] compiling 25 Scala sources to /__w/spark/spark/connector/docker-integration-tests/target/scala-2.12/test-classes ...
[error] /__w/spark/spark/sql/hive/src/test/scala/org/apache/spark/sql/HiveCharVarcharTestSuite.scala:49:13: exception during macro expansion: 
[error] java.util.MissingResourceException: Can't find bundle for base name org.scalactic.ScalacticBundle, locale en_US

@xkrogen

xkrogen commented Nov 7, 2022

Copy link
Copy Markdown
Contributor

Current diff LGTM. We've been testing this out internally (including changes from the umbrella PR as well) and with the recent changes I think this PR looks like it should set us up for success with the rest of the feature.

@jzhuge

jzhuge commented Nov 7, 2022

Copy link
Copy Markdown
Member Author

Looking at error:

SparkThrowableSuite.Error classes are correctly formatted

Fixed

@ljfgem

ljfgem commented Nov 10, 2022

Copy link
Copy Markdown

Hi @jzhuge @holdenk, can we merge this PR? Or is there any other concerns?

@holdenk

holdenk commented Nov 16, 2022

Copy link
Copy Markdown
Contributor

LGTM I'll merge this now to the current dev branch.

@asfgit asfgit closed this in d12abff Nov 16, 2022
@jzhuge

jzhuge commented Nov 16, 2022

Copy link
Copy Markdown
Member Author

Thanks @holdenk, @wmoustafa, @xkrogen, @ljfgem for the reviews!

String[] columnComments();

/**
* The view properties.

@cloud-fan cloud-fan Nov 21, 2022

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion, but shall we put everything (except for the query and schema) into view properties with reserved property keys? TableCatalog was designed this way.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback!

The goal is to capture the entire CREATE VIEW syntax in API. Leave the choice to catalog plugin developers on whether to use properties to implement the APIs. I'd expect HMS-backed view catalog implementation will continue to do so the same as the current v1 implementation. On the other hand, view catalog plugins that support more free-form storage formats such as JSON can choose a different approach.

SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
### What changes were proposed in this pull request?

ViewCatalog API described in [SPIP](https://docs.google.com/document/d/1XOxFtloiMuW24iqJ-zJnDzHl2KMxipTjJoxleJFz66A/edit?usp=sharing).

### Why are the changes needed?

First step towards DataSourceV2 view support.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes apache#37556 from jzhuge/SPARK-39799.

Authored-by: John Zhuge <jzhuge@apache.org>
Signed-off-by: Holden Karau <hkarau@netflix.com>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 15, 2022
### What changes were proposed in this pull request?

ViewCatalog API described in [SPIP](https://docs.google.com/document/d/1XOxFtloiMuW24iqJ-zJnDzHl2KMxipTjJoxleJFz66A/edit?usp=sharing).

### Why are the changes needed?

First step towards DataSourceV2 view support.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes apache#37556 from jzhuge/SPARK-39799.

Authored-by: John Zhuge <jzhuge@apache.org>
Signed-off-by: Holden Karau <hkarau@netflix.com>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
### What changes were proposed in this pull request?

ViewCatalog API described in [SPIP](https://docs.google.com/document/d/1XOxFtloiMuW24iqJ-zJnDzHl2KMxipTjJoxleJFz66A/edit?usp=sharing).

### Why are the changes needed?

First step towards DataSourceV2 view support.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes apache#37556 from jzhuge/SPARK-39799.

Authored-by: John Zhuge <jzhuge@apache.org>
Signed-off-by: Holden Karau <hkarau@netflix.com>
asfgit pushed a commit that referenced this pull request Dec 5, 2023
…iewCatalog

### What changes were proposed in this pull request?

ViewCatalog API improvements described in [SPIP](https://docs.google.com/document/d/1XOxFtloiMuW24iqJ-zJnDzHl2KMxipTjJoxleJFz66A/edit?usp=sharing) that didn't make it into the codebase as part of #37556

### Why are the changes needed?

Required for DataSourceV2 view support.

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

N/A

Closes #43677 from nastra/SPARK-45807.

Authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com>
Signed-off-by: Holden Karau <holden@pigscanfly.ca>
dbatomic pushed a commit to dbatomic/spark that referenced this pull request Dec 11, 2023
…iewCatalog

### What changes were proposed in this pull request?

ViewCatalog API improvements described in [SPIP](https://docs.google.com/document/d/1XOxFtloiMuW24iqJ-zJnDzHl2KMxipTjJoxleJFz66A/edit?usp=sharing) that didn't make it into the codebase as part of apache#37556

### Why are the changes needed?

Required for DataSourceV2 view support.

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

N/A

Closes apache#43677 from nastra/SPARK-45807.

Authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com>
Signed-off-by: Holden Karau <holden@pigscanfly.ca>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants