[SPARK-10381] Fix mixup of taskAttemptNumber & attemptId in OutputCommitCoordinator (branch-1.4 backport)#8789
Closed
JoshRosen wants to merge 2 commits into
Closed
[SPARK-10381] Fix mixup of taskAttemptNumber & attemptId in OutputCommitCoordinator (branch-1.4 backport)#8789JoshRosen wants to merge 2 commits into
JoshRosen wants to merge 2 commits into
Conversation
…mitCoordinator When speculative execution is enabled, consider a scenario where the authorized committer of a particular output partition fails during the OutputCommitter.commitTask() call. In this case, the OutputCommitCoordinator is supposed to release that committer's exclusive lock on committing once that task fails. However, due to a unit mismatch (we used task attempt number in one place and task attempt id in another) the lock will not be released, causing Spark to go into an infinite retry loop. This bug was masked by the fact that the OutputCommitCoordinator does not have enough end-to-end tests (the current tests use many mocks). Other factors contributing to this bug are the fact that we have many similarly-named identifiers that have different semantics but the same data types (e.g. attemptNumber and taskAttemptId, with inconsistent variable naming which makes them difficult to distinguish). This patch adds a regression test and fixes this bug by always using task attempt numbers throughout this code. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#8544 from JoshRosen/SPARK-10381. (cherry picked from commit 38700ea) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
|
Test build #42560 has finished for PR 8789 at commit
|
|
Test build #42567 has finished for PR 8789 at commit
|
Contributor
Author
|
Jenkins, retest this please. |
|
Test build #42575 has finished for PR 8789 at commit
|
Contributor
Author
|
Jenkins, retest this please. |
|
Test build #42614 has finished for PR 8789 at commit
|
Contributor
Author
|
Jenkins, retest this please. |
Contributor
|
LGTM. We can merge it once jenkins is good. |
|
Test build #42697 has finished for PR 8789 at commit
|
Contributor
Author
|
It looks like there haven't been any new commits to branch-1.4 since when this was last tested, so I'm going to merge this now. |
asfgit
pushed a commit
that referenced
this pull request
Sep 21, 2015
…mitCoordinator (branch-1.4 backport) This is a backport of #8544 to `branch-1.4` for inclusion in 1.4.2. Author: Josh Rosen <joshrosen@databricks.com> Closes #8789 from JoshRosen/SPARK-10381-1.4.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a backport of #8544 to
branch-1.4for inclusion in 1.4.2.