Skip to content

[SPARK-18024][SQL] Introduce an internal commit protocol API#15707

Closed
rxin wants to merge 17 commits into
apache:masterfrom
rxin:SPARK-18024-2
Closed

[SPARK-18024][SQL] Introduce an internal commit protocol API#15707
rxin wants to merge 17 commits into
apache:masterfrom
rxin:SPARK-18024-2

Conversation

@rxin

@rxin rxin commented Nov 1, 2016

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This patch introduces an internal commit protocol API that is used by the batch data source to do write commits. It currently has only one implementation that uses Hadoop MapReduce's OutputCommitter API. In the future, this commit API can be used to unify streaming and batch commits.

How was this patch tested?

Should be covered by existing write tests.

@rxin

rxin commented Nov 1, 2016

Copy link
Copy Markdown
Contributor Author

This is the same as #15696

but rebased with #15633

@ericl

ericl commented Nov 1, 2016

Copy link
Copy Markdown
Contributor

This lgtm, modulo the comments in #15696

committer,
iterator = iter)
}).flatten.distinct
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the distinct to updatedPartitions?

@rxin rxin changed the title [SPARK-18024][SQL] Introduce an internal commit protocol API - rebased [SPARK-18024][SQL] Introduce an internal commit protocol API Nov 1, 2016
@SparkQA

SparkQA commented Nov 1, 2016

Copy link
Copy Markdown

Test build #67855 has finished for PR 15707 at commit 0647959.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


val STREAMING_FILE_COMMIT_PROTOCOL_CLASS =
SQLConfigBuilder("spark.sql.streaming.commitProtocolClass")
.internal()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: two spaces

@ericl

ericl commented Nov 1, 2016

Copy link
Copy Markdown
Contributor

This LGTM, just a minor comment

@SparkQA

SparkQA commented Nov 1, 2016

Copy link
Copy Markdown

Test build #67865 has finished for PR 15707 at commit 65ba5c1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin

rxin commented Nov 1, 2016

Copy link
Copy Markdown
Contributor Author

Looks like the test failed due to a flaky test, but other than that everything else was fine. I'm going to merge this optimistically.

@asfgit asfgit closed this in d9d1465 Nov 1, 2016
@SparkQA

SparkQA commented Nov 1, 2016

Copy link
Copy Markdown

Test build #3384 has finished for PR 15707 at commit 0177ded.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class HadoopCommitProtocolWrapper(path: String, isAppend: Boolean)

@SparkQA

SparkQA commented Nov 1, 2016

Copy link
Copy Markdown

Test build #3386 has finished for PR 15707 at commit 65ba5c1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?
This patch introduces an internal commit protocol API that is used by the batch data source to do write commits. It currently has only one implementation that uses Hadoop MapReduce's OutputCommitter API. In the future, this commit API can be used to unify streaming and batch commits.

## How was this patch tested?
Should be covered by existing write tests.

Author: Reynold Xin <rxin@databricks.com>
Author: Eric Liang <ekl@databricks.com>

Closes apache#15707 from rxin/SPARK-18024-2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants