[ZEPPELIN-1730, 1587] add spark impersonation through --proxy-user option#1840
[ZEPPELIN-1730, 1587] add spark impersonation through --proxy-user option#1840khalidhuseynov wants to merge 5 commits into
Conversation
b68a4a0 to
4c3dba9
Compare
|
this is ready for review. @prabhjyotsingh plz help review as original author, also @zjffdu @astroshim @Leemoonsoo as followup from #1566. CI failure in first profile is irrelevant and due to rat problem under ZEPPELIN-1850 |
|
@khalidhuseynov Have you try it in secured cluster ? IIRC, |
|
@zjffdu i didn't try secured cluster mode yet, but as i checked spark documentation, they indeed don't allow using
|
|
Thank you @khalidhuseynov .
The only problem I see with this option is that Kerberos tickets will not be renewed automatically, and will expire at some point. |
One concern is that this requires all the interpreters of one user share the same keytab/principal. e.g. spark interpreter may affect shell interpreter if they use different keytab/principal for the same user. For the long term, we may need to put security related settings in one central place rather than in each interpreter setting. |
|
@zjffdu I agree about bringing security related features together in longer term, possibly also anyone having yarn cluster mode setup with kerberos is more than welcome to test it :) |
5c06974 to
892b7e4
Compare
|
Also @Leemoonsoo review on this one would be helpful |
|
As far as credentials refresh are concerned, please see new comments in SPARK-19143. |
|
I just pushed changes to keep compatibility using |
6b9bf03 to
e4251de
Compare
…tion ### What is this PR for? This is to add spark impersonation using --proxy-user option. note that it enables also to use spark impersonation without having logged user as system user with configured ssh. ### What type of PR is it? Improvement ### Todos * [x] - add `--proxy-user` * [x] - try on standalone spark 1.6.2 * [x] - try on yarn-client mode spark 2.0.1 ### What is the Jira issue? Directly solves [ZEPPELIN-1730](https://issues.apache.org/jira/browse/ZEPPELIN-1730) and also solves [ZEPPELIN-1587](https://issues.apache.org/jira/browse/ZEPPELIN-1587) according to discussion in #1566 since using `--proxy-user` in `spark-submit` is preferable method. ### How should this be tested? 1. switch your spark cluster to `per user` and `isolated` mode 2. set up `user impersonation` flag 3. run some job using that spark interpreter 4. spark context should be created with currently logged in user credentials on behalf of system user ### Screenshots (if appropriate) standalone  yarn-client <img width="997" alt="screen shot 2017-01-04 at 10 00 13 am" src="https://cloud.githubusercontent.com/assets/1642088/21653117/75410fde-d264-11e6-886f-11d8b5dbd29e.png"> ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? yes Author: Khalid Huseynov <khalidhnv@gmail.com> Closes #1840 from khalidhuseynov/feat/spark-proxy-user and squashes the following commits: e4251de [Khalid Huseynov] update doc with env var dc61cae [Khalid Huseynov] check for env spark_proxy in interpreter.sh 8b66740 [Khalid Huseynov] add spark_proxy_user to env.sh 892b7e4 [Khalid Huseynov] add note in docs 4c3dba9 [Khalid Huseynov] add --proxy-user option for spark (cherry picked from commit 5e0aacf) Signed-off-by: Jongyoul Lee <jongyoul@apache.org>
|
Merged it into master and branch-0.7 |
What is this PR for?
This is to add spark impersonation using --proxy-user option. note that it enables also to use spark impersonation without having logged user as system user with configured ssh.
What type of PR is it?
Improvement
Todos
--proxy-userWhat is the Jira issue?
Directly solves ZEPPELIN-1730 and also solves ZEPPELIN-1587 according to discussion in #1566 since using
--proxy-userinspark-submitis preferable method.How should this be tested?
per userandisolatedmodeuser impersonationflagScreenshots (if appropriate)
standalone

yarn-client

Questions: