[SPARK-6209] Clean up connections in ExecutorClassLoader after failing to load classes (branch-1.2)#5174
Conversation
…g to load classes (master branch PR) ExecutorClassLoader does not ensure proper cleanup of network connections that it opens. If it fails to load a class, it may leak partially-consumed InputStreams that are connected to the REPL's HTTP class server, causing that server to exhaust its thread pool, which can cause the entire job to hang. See [SPARK-6209](https://issues.apache.org/jira/browse/SPARK-6209) for more details, including a bug reproduction. This patch fixes this issue by ensuring proper cleanup of these resources. It also adds logging for unexpected error cases. This PR is an extended version of apache#4935 and adds a regression test. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#4944 from JoshRosen/executorclassloader-leak-master-branch and squashes the following commits: e0e3c25 [Josh Rosen] Wrap try block around getReponseCode; re-enable keep-alive by closing error stream 961c284 [Josh Rosen] Roll back changes that were added to get the regression test to fail 7ee2261 [Josh Rosen] Add a failing regression test e2d70a3 [Josh Rosen] Properly clean up after errors in ExecutorClassLoader (cherry picked from commit 7215aa7) Signed-off-by: Andrew Or <andrew@databricks.com> Conflicts: repl/pom.xml repl/src/main/scala/org/apache/spark/repl/ExecutorClassLoader.scala
|
Test build #29114 has started for PR 5174 at commit
|
|
Test build #29114 has finished for PR 5174 at commit
|
|
Test PASSed. |
|
Jenkins, retest this please. |
|
Test build #29616 has started for PR 5174 at commit
|
|
Test build #29616 has finished for PR 5174 at commit
|
|
Test PASSed. |
|
I'm going to merge this now (this is just a backport of the 1.3.x / master branch version of this patch and the relevant code is now pretty well-covered by tests, so this should be safe to include). |
…g to load classes (branch-1.2) ExecutorClassLoader does not ensure proper cleanup of network connections that it opens. If it fails to load a class, it may leak partially-consumed InputStreams that are connected to the REPL's HTTP class server, causing that server to exhaust its thread pool, which can cause the entire job to hang. See [SPARK-6209](https://issues.apache.org/jira/browse/SPARK-6209) for more details, including a bug reproduction. This patch fixes this issue by ensuring proper cleanup of these resources. It also adds logging for unexpected error cases. (See #4944 for the corresponding PR for 1.3/1.4). Author: Josh Rosen <joshrosen@databricks.com> Closes #5174 from JoshRosen/executorclassloaderleak-branch-1.2 and squashes the following commits: 16e38fe [Josh Rosen] [SPARK-6209] Clean up connections in ExecutorClassLoader after failing to load classes (master branch PR)
…g to load classes (branch-1.2) ExecutorClassLoader does not ensure proper cleanup of network connections that it opens. If it fails to load a class, it may leak partially-consumed InputStreams that are connected to the REPL's HTTP class server, causing that server to exhaust its thread pool, which can cause the entire job to hang. See [SPARK-6209](https://issues.apache.org/jira/browse/SPARK-6209) for more details, including a bug reproduction. This patch fixes this issue by ensuring proper cleanup of these resources. It also adds logging for unexpected error cases. (See apache#4944 for the corresponding PR for 1.3/1.4). Author: Josh Rosen <joshrosen@databricks.com> Closes apache#5174 from JoshRosen/executorclassloaderleak-branch-1.2 and squashes the following commits: 16e38fe [Josh Rosen] [SPARK-6209] Clean up connections in ExecutorClassLoader after failing to load classes (master branch PR)
ExecutorClassLoader does not ensure proper cleanup of network connections that it opens. If it fails to load a class, it may leak partially-consumed InputStreams that are connected to the REPL's HTTP class server, causing that server to exhaust its thread pool, which can cause the entire job to hang. See SPARK-6209 for more details, including a bug reproduction.
This patch fixes this issue by ensuring proper cleanup of these resources. It also adds logging for unexpected error cases.
(See #4944 for the corresponding PR for 1.3/1.4).