Skip to content

[SPARK-10195] [SQL] Data sources Filter should not expose internal types#8403

Closed
JoshRosen wants to merge 4 commits into
apache:masterfrom
JoshRosen:datasources-internal-vs-external-types
Closed

[SPARK-10195] [SQL] Data sources Filter should not expose internal types#8403
JoshRosen wants to merge 4 commits into
apache:masterfrom
JoshRosen:datasources-internal-vs-external-types

Conversation

@JoshRosen

Copy link
Copy Markdown
Contributor

Spark SQL's data sources API exposes Catalyst's internal types through its Filter interfaces. This is a problem because types like UTF8String are not stable developer APIs and should not be exposed to third-parties.

This issue caused incompatibilities when upgrading our spark-redshift library to work against Spark 1.5.0. To avoid these issues in the future we should only expose public types through these Filter objects. This patch accomplishes this by using CatalystTypeConverters to add the appropriate conversions.

@SparkQA

SparkQA commented Aug 24, 2015

Copy link
Copy Markdown

Test build #41482 has finished for PR 8403 at commit 6af0a45.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Aug 24, 2015

Copy link
Copy Markdown

Test build #41486 has finished for PR 8403 at commit 1a3d053.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Aug 25, 2015

Copy link
Copy Markdown

Test build #41490 has finished for PR 8403 at commit c3fb4eb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan

Copy link
Copy Markdown
Contributor

Like the buildScan in data source, we shoud not expose internal types outside spark sql, but we also need to provide the ability to build efficient data source by using internal types directly(no conversions) for our build-in data source or advanced users.

There maybe some cases that user need the internal types in Filter to avoid converions and speed up operations, I think we need to improve our data source API to make this stuff more flexible.
cc @liancheng

@rxin

rxin commented Aug 25, 2015

Copy link
Copy Markdown
Contributor

This is once for query - isn't it? It'd make sense to specialize the input, but I don't think it's worth it for the filter pushdowns.

@liancheng

Copy link
Copy Markdown
Contributor

This PR LGTM.

@cloud-fan Same opinion as @rxin. Filter push-down itself isn't a critical path.

@rxin

rxin commented Aug 25, 2015

Copy link
Copy Markdown
Contributor

I've merged this.

asfgit pushed a commit that referenced this pull request Aug 25, 2015
Spark SQL's data sources API exposes Catalyst's internal types through its Filter interfaces. This is a problem because types like UTF8String are not stable developer APIs and should not be exposed to third-parties.

This issue caused incompatibilities when upgrading our `spark-redshift` library to work against Spark 1.5.0.  To avoid these issues in the future we should only expose public types through these Filter objects. This patch accomplishes this by using CatalystTypeConverters to add the appropriate conversions.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #8403 from JoshRosen/datasources-internal-vs-external-types.

(cherry picked from commit 7bc9a8c)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@asfgit asfgit closed this in 7bc9a8c Aug 25, 2015
@JoshRosen JoshRosen deleted the datasources-internal-vs-external-types branch January 15, 2016 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants