[SPARK-6304][Streaming] Fix checkpointing doesn't retain driver port issue.#5060
[SPARK-6304][Streaming] Fix checkpointing doesn't retain driver port issue.#5060jerryshao wants to merge 6 commits into
Conversation
|
Test build #28698 has finished for PR 5060 at commit
|
|
Hi @tdas , I think this is actually a bug, would you please help to review this one? |
|
Since it is pending for a long time. |
|
Test build #30926 has finished for PR 5060 at commit
|
There was a problem hiding this comment.
It doesn't seem worth tacking on yet more little fields in SparkContext just for a niche use case in a submodule. Use the config object in Checkpoint.
There was a problem hiding this comment.
I am not sure it is a good idea to clutter SparkContext further with such functions, especially when Spark core itself does not use it. Would be good to find a different solution.
There was a problem hiding this comment.
But I think there has to be a place in Spark Core to judge whether this configuration is set by user or Spark itself before SparkContext is initialized, either in SparkConf or somewhere else. It cannot be gotten from Spark Streaming, where all the SparkContext things have already been initialized.
There was a problem hiding this comment.
I'm not sure how to track this in Checkpoint, since SparkEnv will reset this configuration if user not set it, so in the Checkpoint how to differentiate whether this is set by user or SparkEnv?
|
I thought more about this particular issue. These two lines in the SparkContext actually sets the two parameters explicitly with host = local hostname and port = 0, ONLY IF they are not set from the user-provided conf. For each of them here is my observation.
I think this simplifies the whole idea - never save host, always save port. Isnt it? |
|
Test build #31052 has started for PR 5060 at commit |
|
I see, that's the problem. Its not clear from the streaming code what the value that the user had set before the SparkEnv set the port value. Without that this problem cannot be solved. |
|
bump. What is the status on this PR @jerryshao @tdas? |
|
retest this please |
|
Hi @andrewor14 , I think this is indeed a bug for checkpointing in Spark Streaming, my implementation just add two fields in SparkContext to save the snapshot of this two configurations, but this is no so elegant as TD suggested, Currently I cannot figure out any other better place to hold this port number. So any suggestion is greatly appreciated. |
|
Test build #35191 has finished for PR 5060 at commit
|
|
The tricky part for this PR is to figure out when the port was specified by On Thu, Jun 18, 2015 at 6:39 PM, UCB AMPLab notifications@github.com
|
|
@jerryshao let finalize this PR. I think what we should do is. Currently, we remove spark.driver.* from conf used to create the recovered streaming context, ignoring the fact that the user may be explicitly setting those conf in spark-defaults.conf. That is wrong, the general policy should be never recover from spark.driver.* from checkpointed conf. Then if the those properties are set in the defaults, they would be present in the final conf for restarting context, other they wont be. This solves the original problem in the JIRA. If someone wants to set the port explicitly, then they can set if in the spark-defaults.conf. With the above change, it will not be explicitly deleted when recovering and will be automatically used in the recovered context. Sounds good? If so, please update the PR. |
|
Yeah, that sounds good :), let me update the code. |
There was a problem hiding this comment.
Could you keep these alphabetically sorted. Looks cleaner.
|
Test build #37341 has finished for PR 5060 at commit
|
There was a problem hiding this comment.
This just tests whether its correctly set in the new conf when it is set as system property. you should also test the other case, where the new conf does not have them when they are not in the property, even though it was present in the original conf.
There was a problem hiding this comment.
OK, I will update the test
|
Test build #37352 has finished for PR 5060 at commit
|
|
Test build #37446 has finished for PR 5060 at commit
|
No description provided.