Skip to content

ZEPPELIN-3404. Fail to run cronjob when user doesn't run it manually before cronjob#2925

Closed
zjffdu wants to merge 1 commit into
apache:masterfrom
zjffdu:ZEPPELIN-3404
Closed

ZEPPELIN-3404. Fail to run cronjob when user doesn't run it manually before cronjob#2925
zjffdu wants to merge 1 commit into
apache:masterfrom
zjffdu:ZEPPELIN-3404

Conversation

@zjffdu
Copy link
Copy Markdown
Contributor

@zjffdu zjffdu commented Apr 13, 2018

What is this PR for?

This bug is introduced by #2914, this PR will set authenticationInfo using its user as we will store user into note.json.

What type of PR is it?

[Bug Fix]

Todos

  • - Task

What is the Jira issue?

How should this be tested?

  • Manually tested

Questions:

  • Does the licenses files need update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No

@zjffdu
Copy link
Copy Markdown
Contributor Author

zjffdu commented Apr 16, 2018

@felixcheung Mind to help review it ?

// paragraph manually.
p.setAuthenticationInfo(new AuthenticationInfo((p.getUser())));
} else {
p.setAuthenticationInfo(AuthenticationInfo.ANONYMOUS);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok, is it better to make sure authenticationInfo is set when scheduling the cronjob?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find a more proper place for that :(
Code refactoring is needed for Paragraph/Note for more clean code IMHO

@r-kamath
Copy link
Copy Markdown
Member

LGTM

@weand
Copy link
Copy Markdown
Contributor

weand commented Apr 16, 2018

so running via cron will use the last user that executed a paragraph. that does not fit exactly to the docu change from original PR:

Cron executing user (It is removed from 0.8 where it enforces the cron execution user to be the note owner for security purpose)
https://github.com/apache/zeppelin/pull/2914/files#diff-bed1be4fb1f7f6c8e3bd6db18d72dc44

@zjffdu
Copy link
Copy Markdown
Contributor Author

zjffdu commented Apr 17, 2018

Thanks @weand , even I didn't realize the user in note.json means the last user run the paragraph (I thought it is the owner). In that case, it seems meaningless to store user in note.json IMHO.
Regarding the issue, I will update it soon

@zjffdu
Copy link
Copy Markdown
Contributor Author

zjffdu commented Apr 17, 2018

Update the PR to run the cron job as the note owner

if (owners.isEmpty()) {
p.setAuthenticationInfo(AuthenticationInfo.ANONYMOUS);
} else {
p.setAuthenticationInfo(new AuthenticationInfo(owners.iterator().next()));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check if owners.iterator().next() is an empty string?

Copy link
Copy Markdown
Contributor Author

@zjffdu zjffdu Apr 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible ? If it's empty string, then it should be a bug in authorization component.

@mebelousov
Copy link
Copy Markdown
Contributor

Groups can be note owners. As I see AuthenticationInfo is not valid for internal shiro groups.

How can I find the user which scheduled note?

@zjffdu
Copy link
Copy Markdown
Contributor Author

zjffdu commented Apr 17, 2018

@mebelousov Currently I would only allow note owner to set the schedule and run cron job via the owner. But seems zeppelin allow multiple owners for one note (it doesn't make sense to me), this is only what I can do the eliminate the security issue.

@mebelousov
Copy link
Copy Markdown
Contributor

mebelousov commented Apr 17, 2018

@zjffdu I have checked with a group as owner. This is ok.
Why do you think that use of owner is more secure than cronExecutingUser?

@weand
Copy link
Copy Markdown
Contributor

weand commented Apr 17, 2018

@zjffdu is there a chance to set the cronExecutionUser implictly to the user who activates the cron schedule.
and in addtion only allow owners (which can be defined by username or groupnames) to activate the cron schedule.

@weand
Copy link
Copy Markdown
Contributor

weand commented Apr 17, 2018

and optionally (or alternatively) introduce a config parameter for a static cronExecutionUser (kind of service user) to be defined by ops team. when this config parameter is set, cron schedules will always use that user.

@zjffdu
Copy link
Copy Markdown
Contributor Author

zjffdu commented Apr 18, 2018

@mebelousov What do you mean group as owner ? And how do you use that ?

@mebelousov
Copy link
Copy Markdown
Contributor

@weand cronExecutionUser is set to the user who set the cron string inside the note.
@zjffdu I define groups in shiro.ini.
The user sets groupname in owner field. Then the note runs on cron from this groupname.

image

@zjffdu
Copy link
Copy Markdown
Contributor Author

zjffdu commented Apr 19, 2018

@mebelousov Your solution depends on shiro setting, we could not assume that in code.

@mebelousov
Copy link
Copy Markdown
Contributor

@zjffdu This is your solution ;) I have tested your branch.

Do we have uniform opinion that running document as the group is not good?

@zjffdu
Copy link
Copy Markdown
Contributor Author

zjffdu commented Apr 19, 2018

I am not saying using group is not good, it just depends on user's setting. BTW, it looks like this PR also works for your scenario. In this PR, I would always choose the owner to run the cronjob. And in your case, you just set the owner as group. Do I understand it correctly ? :)

@weand
Copy link
Copy Markdown
Contributor

weand commented Apr 19, 2018

@mebelousov I fully agree that running document as the group is not what a user would expect.

That's why I'm proposing another approach again:

  1. prior to [ZEPPELIN-3350] Don't allow set cronExecutionUser #2914 the cronExecutionUser config property was added to org.apache.zeppelin.notebook.Note.config when activating the cron and entering a username
  2. that property is not set anymore. this PR here only sets the authentication info at runtime on the paragrpah to something (a principal) from the permission tab (which can be user or groups, which can not be determined properly) -> org.apache.zeppelin.notebook.Paragraph.setAuthenticationInfo(AuthenticationInfo)
  3. so why not again set that cronExecutionUser config property from 1) (under the hood): with the user who activates the cron schedule expression. when only the owner can enable cron expressions, it will always be the proper owner username regardless of what principals exactly are configured on the permissions tab.

@zjffdu
Copy link
Copy Markdown
Contributor Author

zjffdu commented Apr 20, 2018

Thanks @weand , I agree with you that your proposal seems much easier. I have updated the PR, please help verify it, thanks

if (authenticationInfo != null) {
p.setAuthenticationInfo(authenticationInfo);
}
p.setAuthenticationInfo(authenticationInfo);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yap, I like this better

public synchronized void runAll() {
runAll(null, true);
String cronExecutingUser = (String) getConfig().get("cronExecutingUser");
if (null == cronExecutingUser) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure it is matters here, check for empty or blank string?

@weand
Copy link
Copy Markdown
Contributor

weand commented Apr 21, 2018

The notebook now runs successfully after restart with the user who enabled the scheduler.

2 remaining issues:

  1. user1 schedules the cron. then the note is executed via cron with 'user1' only as long as nobody else runs the note. as soon as 'user2' runs the note manually, the cron will also use 'user2' from that time on. Endusers would expect the cron will always run as 'user1'.

  2. One more valid but yet uncovered use case left in my mind (dealing with permissions):

  • interpreter permissions have been set only for groups (e.g. DepartmentA should be allowed to run spark interpreter)
  • user created note with %spark interpreter and schedules a cron
  • now when cron runs, only username is added to authentication info. groups of the user (e.g. DepartmentA in that example) are omitted and the notebook fails with an error, that user 'xyz' has no permission for running the interpreter.

any chance to fix that case as well?
I think of something like introducing cronExecutingRoles (in addition to cronExecutingUser) and adding all roles of the user who enables the cron to that config property. And then when a cron runs, all that roles are added to the corresponding authentication info.

@mebelousov
Copy link
Copy Markdown
Contributor

@weand Thank you!
Some addition to 2.
Over time user may not belong to group.
At first we could store cronExecutingRoles and in future it's to be good to check groups on the fly.

@zjffdu
Copy link
Copy Markdown
Contributor Author

zjffdu commented Apr 23, 2018

@weand I don't see the first issue you mentioned, can you confirm that ?
For the second point, I am afraid this is not trivial to fix, I would suggest to defer it to 0.8.1 or 0.9, what do you think ?

@weand
Copy link
Copy Markdown
Contributor

weand commented Apr 23, 2018

@zjffdu uhm, I can't confirm it. Today after checking out this PR from scratch my first point now works correctly. sorry for the spam.

created issue ZEPPELIN-3427 for the second point. is that really so complex to wait for 0.8.1 or 0.9.0 ?

Part of that change would be:

  1. moving/adding logic for setting note.config.cronExecutionUser + cronExecutionRoles (new) to org.apache.zeppelin.socket.NotebookServer.isCronUpdated(Map<String, Object>, Map<String, Object>) when cronUpdated = true.
  2. Then evaluate those config properties in org.apache.zeppelin.notebook.Note.runAll() when authenticationInfo is created (already considers cronExecutionUser)

@zjffdu
Copy link
Copy Markdown
Contributor Author

zjffdu commented Apr 24, 2018

Thanks for the hint @weand , I am not familiar with this component and didn't realize it is so not difficult to do that.

PR is updated, please help try that. Thanks again.

$scope.note.config.cronExecutingRoles = $rootScope.ticket.roles;
}
} else {
$scope.note.config.cronExecutingUser = '';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roles should be reset upon disabling cron scheduler, so $scope.note.config.cronExecutingRoles = '' should be added here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Copy Markdown
Contributor

@weand weand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added two comments.

if (null == cronExecutingUser) {
cronExecutingUser = "anonymous";
}
AuthenticationInfo authenticationInfo = new AuthenticationInfo();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing roles here with split(...) does not work anymore, when having more than one role.
There were recent changes to parsing the roles, see:
ad77265#diff-4a78303414018c7d08a044687ab6afd4R57

This code instead worked locally. AuthenticationInfo constructor will properly parse the roles:

    ...
    AuthenticationInfo authenticationInfo = new AuthenticationInfo(
        cronExecutingUser, 
        StringUtils.isEmpty(cronExecutingRoles) ? null : cronExecutingRoles,
        null);
    runAll(authenticationInfo, true);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed

@zjffdu
Copy link
Copy Markdown
Contributor Author

zjffdu commented Apr 26, 2018

Thanks @weand for review, comments are addressed

@weand
Copy link
Copy Markdown
Contributor

weand commented Apr 26, 2018

great. LGTM 👍 please merge to master and branch-0.8

asfgit pushed a commit that referenced this pull request Apr 26, 2018
…before cronjob

### What is this PR for?

This bug is introduced by #2914, this PR will set authenticationInfo using its user as we will store user into note.json.

### What type of PR is it?
[Bug Fix]

### Todos
* [ ] - Task

### What is the Jira issue?
* https://issues.apache.org/jira/browse/ZEPPELIN-3404

### How should this be tested?
* Manually tested

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Author: Jeff Zhang <zjffdu@apache.org>

Closes #2925 from zjffdu/ZEPPELIN-3404 and squashes the following commits:

b94ecc9 [Jeff Zhang] ZEPPELIN-3404. Fail to run cronjob when user doesn't run it manually before cronjob

(cherry picked from commit 1cea92c)
Signed-off-by: Jeff Zhang <zjffdu@apache.org>
@asfgit asfgit closed this in 1cea92c Apr 26, 2018
prabhjyotsingh pushed a commit to prabhjyotsingh/zeppelin that referenced this pull request Jul 4, 2018
…before cronjob

This bug is introduced by apache#2914, this PR will set authenticationInfo using its user as we will store user into note.json.

[Bug Fix]

* [ ] - Task

* https://issues.apache.org/jira/browse/ZEPPELIN-3404

* Manually tested

* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Author: Jeff Zhang <zjffdu@apache.org>

Closes apache#2925 from zjffdu/ZEPPELIN-3404 and squashes the following commits:

b94ecc9 [Jeff Zhang] ZEPPELIN-3404. Fail to run cronjob when user doesn't run it manually before cronjob

(cherry picked from commit 1cea92c)
Signed-off-by: Jeff Zhang <zjffdu@apache.org>
(cherry picked from commit eb7969b)

Change-Id: I7c747eaefcc2d7234d0cf07aeaeb26b72c26ee43
mckartha pushed a commit to syntechdev/zeppelin that referenced this pull request Aug 9, 2018
…before cronjob

### What is this PR for?

This bug is introduced by apache#2914, this PR will set authenticationInfo using its user as we will store user into note.json.

### What type of PR is it?
[Bug Fix]

### Todos
* [ ] - Task

### What is the Jira issue?
* https://issues.apache.org/jira/browse/ZEPPELIN-3404

### How should this be tested?
* Manually tested

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Author: Jeff Zhang <zjffdu@apache.org>

Closes apache#2925 from zjffdu/ZEPPELIN-3404 and squashes the following commits:

b94ecc9 [Jeff Zhang] ZEPPELIN-3404. Fail to run cronjob when user doesn't run it manually before cronjob
mckartha pushed a commit to syntechdev/zeppelin that referenced this pull request Aug 9, 2018
…before cronjob

### What is this PR for?

This bug is introduced by apache#2914, this PR will set authenticationInfo using its user as we will store user into note.json.

### What type of PR is it?
[Bug Fix]

### Todos
* [ ] - Task

### What is the Jira issue?
* https://issues.apache.org/jira/browse/ZEPPELIN-3404

### How should this be tested?
* Manually tested

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Author: Jeff Zhang <zjffdu@apache.org>

Closes apache#2925 from zjffdu/ZEPPELIN-3404 and squashes the following commits:

b94ecc9 [Jeff Zhang] ZEPPELIN-3404. Fail to run cronjob when user doesn't run it manually before cronjob

(cherry picked from commit 1cea92c)
Signed-off-by: Jeff Zhang <zjffdu@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants