ZEPPELIN-4176. Remove old spark interpreter by zjffdu · Pull Request #3375 · apache/zeppelin

zjffdu · 2019-06-03T06:11:08Z

What is this PR for?

This PR is just to remove the old spark interpreter. The old spark interpreter has several issues, and we introduce new spark interpreter implementation in 0.8. This ticket is to remove it in 0.9. Here's the issues of old spark interpreter.

Didn't use native scala shell api.
Dependency management is not applied for yarn cluster mode.
It can not support scala 2.12 due to point 1

What type of PR is it?

[ Improvement ]

Todos

- Task

What is the Jira issue?

https://jira.apache.org/jira/browse/ZEPPELIN-4176

How should this be tested?

CI pass

Screenshots (if appropriate)

Questions:

Does the licenses files need update? No
Is there breaking changes for older versions? No
Does this needs documentation? No

felixcheung · 2019-06-04T03:01:56Z

+        // zeppelin.spark.useHiveContext & zeppelin.spark.concurrentSQL are legacy zeppelin
+        // properties, convert them to spark properties here.
+        if (entry.getKey().toString().equals("zeppelin.spark.useHiveContext")) {
+          conf.set("spark.useHiveContext", entry.getValue().toString());


I don't recall this is the name in spark? spark.useHiveContext anyway this isn't supported in spark today?

yes, it is not supported today, just for legacy support. Will remove it in future.

felixcheung · 2019-06-04T03:02:30Z

+        if (entry.getKey().toString().equals("zeppelin.spark.useHiveContext")) {
+          conf.set("spark.useHiveContext", entry.getValue().toString());
+        }
+        if (entry.getKey().toString().equals("zeppelin.spark.concurrentSQL")


should we only set this for SQL interpreter? this might have unintended effect for non-SQL ones

I am afraid no. Because it would set spark.scheduler.mode which is need to be set when start driver is starting. And starting driver is in SparkInterpreter

then maybe we should rename and deprecate this one in the next release. IIRC, people has complained about paragraph execution order changing and breaking stuff, so if this affects all spark interpreter and not just sql, it has a higher risk of that

This would not affect the executing order of spark scala code. Because spark scala interpreter use FIFOScheduler. Only SparkSqlInterpreter is affected, as SparkSqlInterpreter use ParallelScheduler https://github.com/apache/zeppelin/blob/master/spark/interpreter/src/main/java/org/apache/zeppelin/spark/SparkSqlInterpreter.java#L128

felixcheung · 2019-06-04T03:03:29Z

+      this.innerInterpreter.bind("z", z.getClass().getCanonicalName(), z,
+          Lists.newArrayList("@transient"));
+    } catch (Exception e) {
+      LOGGER.error("Fail to open SparkInterpreter", ExceptionUtils.getStackTrace(e));


log e instead of ExceptionUtils.getStackTrace(e)?

felixcheung · 2019-06-04T03:04:04Z

+    if (scalaVersionString.contains("version 2.10")) {
+      return "2.10";
+    } else {
+      return "2.11";


this could break with scala 2.12

Right, I will fix it in #3034

felixcheung · 2019-06-09T03:32:13Z

I don't see any blocker other than comment on 2.12 and fair scheduler mode

xunliu · 2019-06-10T07:09:49Z

-      try {
-        String keytab = getProperties().getProperty("spark.yarn.keytab");
-        String principal = getProperties().getProperty("spark.yarn.principal");
-        UserGroupInformation.loginUserFromKeytab(principal, keytab);


Set the kerberos authentication information according to the configuration in the new spark interpreter.
Can it be added to the new spark interpreter?
This is very useful.

This is legacy code for OldSparkInterpreter. At that time, we didn't pass spark conf via --conf in spark-submit. But now, we correct that in SparkInterpreterLauncher, so we don't need to do that again.

### What is this PR for? When Zeppelin is running in Kubernetes, "View in Spark web UI" gives internal address, instead of address defined in SERVICE_DOMAIN. I think this problem is side effect of #3375 and this PR includes fix and updated unittest. ### What type of PR is it? Bug Fix ### What is the Jira issue? https://issues.apache.org/jira/browse/ZEPPELIN-4226 ### How should this be tested? Run Zeppelin on kubernetes, and run spark job, click "View in Spark web UI" button. ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? no Author: Lee moon soo <moon@apache.org> Closes #3451 from Leemoonsoo/ZEPPELIN-4226 and squashes the following commits: 7e34542 [Lee moon soo] use StringUtils.isBlank a33c3b2 [Lee moon soo] pickup SparkUI address from zeppelin.spark.uiWebUrl

ZEPPELIN-4176. Remove old spark interpreter

0578f4a

felixcheung reviewed Jun 4, 2019

View reviewed changes

address comment

4efa61f

xunliu reviewed Jun 10, 2019

View reviewed changes

asfgit closed this in f5ee329 Jun 12, 2019

Leemoonsoo mentioned this pull request Sep 18, 2019

[ZEPPELIN-4226] Fix "View in Spark web UI" in kubernetes mode #3451

Closed

Conversation

zjffdu commented Jun 3, 2019

What is this PR for?

What type of PR is it?

Todos

What is the Jira issue?

How should this be tested?

Screenshots (if appropriate)

Questions:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

felixcheung commented Jun 9, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants