[SPARK-7736] [core] Fix a race introduced in PythonRunner.#8258
Conversation
The fix for SPARK-7736 introduced a race where a port value of "-1" could be passed down to the pyspark process, causing it to fail to connect back to the JVM. This change adds code to fix that race.
|
Example of the error: |
|
retest this please |
|
@vanzin is this the right JIRA? |
|
also, I've seen this non-determinism from the user list. It would definitely be good to fix it. |
|
Yes; this bug was introduced by a change that I pushed this morning (to fix the same bug this PR mentions; see #7751). |
|
Test build #41074 has finished for PR 8258 at commit
|
|
Test build #41077 has finished for PR 8258 at commit
|
|
Test build #41089 timed out for PR 8258 at commit |
There was a problem hiding this comment.
LGTM. I was going to say, isn't there some concurrency utility for this? and you could use a task or future or semaphore, but it might not be any less code.
There was a problem hiding this comment.
I wish there was something, but the GatewayServer API is super weird.
|
I'll try tests again but I'm inclined to merge this soon. retest this please |
|
Test build #41137 has finished for PR 8258 at commit
|
|
pyspark fail is the same flaky test that has been failing on and off for a long time. I'm merging this. |
The fix for SPARK-7736 introduced a race where a port value of "-1" could be passed down to the pyspark process, causing it to fail to connect back to the JVM. This change adds code to fix that race. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8258 from vanzin/SPARK-7736. (cherry picked from commit c1840a8)
The fix for SPARK-7736 introduced a race where a port value of "-1" could be passed down to the pyspark process, causing it to fail to connect back to the JVM. This change adds code to fix that race. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#8258 from vanzin/SPARK-7736. (cherry picked from commit c1840a8)
The fix for SPARK-7736 introduced a race where a port value of "-1"
could be passed down to the pyspark process, causing it to fail to
connect back to the JVM. This change adds code to fix that race.