[ZEPPELIN-1587] (WIP) Add impersonation routine in SparkInterpreter for current user#1566
[ZEPPELIN-1587] (WIP) Add impersonation routine in SparkInterpreter for current user#1566khalidhuseynov wants to merge 4 commits into
Conversation
|
I don't know how much this helps in impersonation. At least it doesn't affect executor. Do you see any impact on the driver side ? |
c496c97 to
8bbb64f
Compare
|
@zjffdu i updated description and pushed some changes. as you can see from screenshot it makes usage of current |
|
Using --proxy-user instead of doAs would work? https://issues.apache.org/jira/browse/ZEPPELIN-1730 |
|
@Leemoonsoo yes that's possible; moreover I originally implemented it using in hdfs-site.xml so that you can't run without that configuration, which makes sense. |
|
@khalidhuseynov I think |
|
@zjffdu makes sense. then i'll change it back to use |
|
@khalidhuseynov How about using |
|
@astroshim thanks for help, I'll test with |
fc2ee6a to
4547781
Compare
|
So I've done some research and and thanks @astroshim for PR and help, that would work in isolated mode. |
|
That's correct, impersonation for spark interpreter can only be applied to isolated mode. It is due to capability of spark. |
|
I agree with @khalidhuseynov's opinion. |
…tion ### What is this PR for? This is to add spark impersonation using --proxy-user option. note that it enables also to use spark impersonation without having logged user as system user with configured ssh. ### What type of PR is it? Improvement ### Todos * [x] - add `--proxy-user` * [x] - try on standalone spark 1.6.2 * [x] - try on yarn-client mode spark 2.0.1 ### What is the Jira issue? Directly solves [ZEPPELIN-1730](https://issues.apache.org/jira/browse/ZEPPELIN-1730) and also solves [ZEPPELIN-1587](https://issues.apache.org/jira/browse/ZEPPELIN-1587) according to discussion in #1566 since using `--proxy-user` in `spark-submit` is preferable method. ### How should this be tested? 1. switch your spark cluster to `per user` and `isolated` mode 2. set up `user impersonation` flag 3. run some job using that spark interpreter 4. spark context should be created with currently logged in user credentials on behalf of system user ### Screenshots (if appropriate) standalone  yarn-client <img width="997" alt="screen shot 2017-01-04 at 10 00 13 am" src="https://cloud.githubusercontent.com/assets/1642088/21653117/75410fde-d264-11e6-886f-11d8b5dbd29e.png"> ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? yes Author: Khalid Huseynov <khalidhnv@gmail.com> Closes #1840 from khalidhuseynov/feat/spark-proxy-user and squashes the following commits: e4251de [Khalid Huseynov] update doc with env var dc61cae [Khalid Huseynov] check for env spark_proxy in interpreter.sh 8b66740 [Khalid Huseynov] add spark_proxy_user to env.sh 892b7e4 [Khalid Huseynov] add note in docs 4c3dba9 [Khalid Huseynov] add --proxy-user option for spark
…tion ### What is this PR for? This is to add spark impersonation using --proxy-user option. note that it enables also to use spark impersonation without having logged user as system user with configured ssh. ### What type of PR is it? Improvement ### Todos * [x] - add `--proxy-user` * [x] - try on standalone spark 1.6.2 * [x] - try on yarn-client mode spark 2.0.1 ### What is the Jira issue? Directly solves [ZEPPELIN-1730](https://issues.apache.org/jira/browse/ZEPPELIN-1730) and also solves [ZEPPELIN-1587](https://issues.apache.org/jira/browse/ZEPPELIN-1587) according to discussion in #1566 since using `--proxy-user` in `spark-submit` is preferable method. ### How should this be tested? 1. switch your spark cluster to `per user` and `isolated` mode 2. set up `user impersonation` flag 3. run some job using that spark interpreter 4. spark context should be created with currently logged in user credentials on behalf of system user ### Screenshots (if appropriate) standalone  yarn-client <img width="997" alt="screen shot 2017-01-04 at 10 00 13 am" src="https://cloud.githubusercontent.com/assets/1642088/21653117/75410fde-d264-11e6-886f-11d8b5dbd29e.png"> ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? yes Author: Khalid Huseynov <khalidhnv@gmail.com> Closes #1840 from khalidhuseynov/feat/spark-proxy-user and squashes the following commits: e4251de [Khalid Huseynov] update doc with env var dc61cae [Khalid Huseynov] check for env spark_proxy in interpreter.sh 8b66740 [Khalid Huseynov] add spark_proxy_user to env.sh 892b7e4 [Khalid Huseynov] add note in docs 4c3dba9 [Khalid Huseynov] add --proxy-user option for spark (cherry picked from commit 5e0aacf) Signed-off-by: Jongyoul Lee <jongyoul@apache.org>
What is this PR for?
This is to add impersonation routine for SparkInterpreter, meaning any communication with hadoop hdfs should be done with current user credentials
What type of PR is it?
Improvement
Todos
What is the Jira issue?
ZEPPELIN-1587
How should this be tested?
executing hdfs related file write should be done by your logged in username, e.g. in the example i used the following code:
Screenshots (if appropriate)
Questions: