Skip to content

[ZEPPELIN-1730, 1587] add spark impersonation through --proxy-user option#1840

Closed
khalidhuseynov wants to merge 5 commits into
apache:masterfrom
khalidhuseynov:feat/spark-proxy-user
Closed

[ZEPPELIN-1730, 1587] add spark impersonation through --proxy-user option#1840
khalidhuseynov wants to merge 5 commits into
apache:masterfrom
khalidhuseynov:feat/spark-proxy-user

Conversation

@khalidhuseynov
Copy link
Copy Markdown
Member

@khalidhuseynov khalidhuseynov commented Jan 4, 2017

What is this PR for?

This is to add spark impersonation using --proxy-user option. note that it enables also to use spark impersonation without having logged user as system user with configured ssh.

What type of PR is it?

Improvement

Todos

  • - add --proxy-user
  • - try on standalone spark 1.6.2
  • - try on yarn-client mode spark 2.0.1

What is the Jira issue?

Directly solves ZEPPELIN-1730 and also solves ZEPPELIN-1587 according to discussion in #1566 since using --proxy-user in spark-submit is preferable method.

How should this be tested?

  1. switch your spark cluster to per user and isolated mode
  2. set up user impersonation flag
  3. run some job using that spark interpreter
  4. spark context should be created with currently logged in user credentials on behalf of system user

Screenshots (if appropriate)

standalone
spark_sc_impersonation

yarn-client
screen shot 2017-01-04 at 10 00 13 am

Questions:

  • Does the licenses files need update? no
  • Is there breaking changes for older versions? no
  • Does this needs documentation? yes

@khalidhuseynov khalidhuseynov changed the title [ZEPPELIN-1730, 1587] WIP add spark impersonation through --proxy-user option [ZEPPELIN-1730, 1587] add spark impersonation through --proxy-user option Jan 4, 2017
@khalidhuseynov khalidhuseynov reopened this Jan 4, 2017
@khalidhuseynov
Copy link
Copy Markdown
Member Author

this is ready for review. @prabhjyotsingh plz help review as original author, also @zjffdu @astroshim @Leemoonsoo as followup from #1566. CI failure in first profile is irrelevant and due to rat problem under ZEPPELIN-1850

@zjffdu
Copy link
Copy Markdown
Contributor

zjffdu commented Jan 5, 2017

@khalidhuseynov Have you try it in secured cluster ? IIRC, --proxy-user can not work with --principal & --keytab together, that means in secured cluster, user have to run kinit instead of using --principal & --keytab. This might not be user expect.

@khalidhuseynov
Copy link
Copy Markdown
Member Author

@zjffdu i didn't try secured cluster mode yet, but as i checked spark documentation, they indeed don't allow using --principal & --keytab for spark-submit alongside with --proxy-user because of security issue on exposing keytab. Then possible solutions could be:

  1. user configures export ZEPPELIN_IMPERSONATE_CMD in here with kinit <principal>@<REALM> -k -t <keytab file> and then it's run before spark-submit
  2. don't use --proxy-user in cluster mode
  3. other suggestions

@Tagar
Copy link
Copy Markdown
Contributor

Tagar commented Jan 5, 2017

Thank you @khalidhuseynov .
On

user configures export ZEPPELIN_IMPERSONATE_CMD in here with kinit @ -k -t and then it's run before spark-submit

The only problem I see with this option is that Kerberos tickets will not be renewed automatically, and will expire at some point.

@zjffdu
Copy link
Copy Markdown
Contributor

zjffdu commented Jan 6, 2017

user configures export ZEPPELIN_IMPERSONATE_CMD in here with kinit <principal>@<REALM> -k -t <keytab file> and then it's run before spark-submit

One concern is that this requires all the interpreters of one user share the same keytab/principal. e.g. spark interpreter may affect shell interpreter if they use different keytab/principal for the same user. For the long term, we may need to put security related settings in one central place rather than in each interpreter setting.

@khalidhuseynov
Copy link
Copy Markdown
Member Author

@zjffdu I agree about bringing security related features together in longer term, possibly Credentials menu could be used for that.
Also regarding previously discussed running of --proxy-user with yarn cluster mode, I believe it's currently not supported in Zeppelin. As far as I know only standalone and yarn-client modes are supported by pure Spark interpreter.
@Tagar right, if used in that way, kerberos tickets wouldn't be renewed automatically. However as i said, I think Spark interpreter doesn't support yarn cluster mode, so using ZEPPELIN_IMPERSONATE_CMD with kinit wouldn't be required in that case.

also anyone having yarn cluster mode setup with kerberos is more than welcome to test it :)

@khalidhuseynov khalidhuseynov force-pushed the feat/spark-proxy-user branch 3 times, most recently from 5c06974 to 892b7e4 Compare January 10, 2017 07:02
@khalidhuseynov
Copy link
Copy Markdown
Member Author

Also @Leemoonsoo review on this one would be helpful

@Tagar
Copy link
Copy Markdown
Contributor

Tagar commented Jan 10, 2017

As far as credentials refresh are concerned, please see new comments in SPARK-19143.
Hope this helps.

@khalidhuseynov
Copy link
Copy Markdown
Member Author

I just pushed changes to keep compatibility using ZEPPELIN_IMPERSONATE_SPARK_PROXY_USER env. variable that will disable usage of --proxy-user option. after SPARK-19143 resolved, maybe can come back to it again.

@asfgit asfgit closed this in 5e0aacf Jan 12, 2017
asfgit pushed a commit that referenced this pull request Jan 12, 2017
…tion

### What is this PR for?
This is to add spark impersonation using --proxy-user option. note that it enables also to use spark impersonation without having logged user as system user with configured ssh.

### What type of PR is it?
Improvement

### Todos
* [x] - add `--proxy-user`
* [x] - try on standalone spark 1.6.2
* [x] - try on yarn-client mode spark 2.0.1

### What is the Jira issue?
Directly solves [ZEPPELIN-1730](https://issues.apache.org/jira/browse/ZEPPELIN-1730) and also solves [ZEPPELIN-1587](https://issues.apache.org/jira/browse/ZEPPELIN-1587) according to discussion in #1566 since using `--proxy-user` in `spark-submit` is preferable method.

### How should this be tested?
1. switch your spark cluster to `per user` and `isolated` mode
2. set up `user impersonation` flag
3. run some job using that spark interpreter
4. spark context should be created with currently logged in user credentials on behalf of system user

### Screenshots (if appropriate)
standalone
![spark_sc_impersonation](https://cloud.githubusercontent.com/assets/1642088/21639292/24240286-d224-11e6-8099-9bc74a06f0c2.gif)

yarn-client
<img width="997" alt="screen shot 2017-01-04 at 10 00 13 am" src="https://cloud.githubusercontent.com/assets/1642088/21653117/75410fde-d264-11e6-886f-11d8b5dbd29e.png">

### Questions:
* Does the licenses files need update? no
* Is there breaking changes for older versions? no
* Does this needs documentation? yes

Author: Khalid Huseynov <khalidhnv@gmail.com>

Closes #1840 from khalidhuseynov/feat/spark-proxy-user and squashes the following commits:

e4251de [Khalid Huseynov] update doc with env var
dc61cae [Khalid Huseynov] check for env spark_proxy in interpreter.sh
8b66740 [Khalid Huseynov] add spark_proxy_user to env.sh
892b7e4 [Khalid Huseynov] add note in docs
4c3dba9 [Khalid Huseynov] add --proxy-user option for spark

(cherry picked from commit 5e0aacf)
Signed-off-by: Jongyoul Lee <jongyoul@apache.org>
@jongyoul
Copy link
Copy Markdown
Member

Merged it into master and branch-0.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants