Skip to content

[ZEPPELIN-1787] Add an example of Flink Notebook#1758

Closed
AlexanderShoshin wants to merge 7 commits into
apache:masterfrom
AlexanderShoshin:ZEPPELIN-1787
Closed

[ZEPPELIN-1787] Add an example of Flink Notebook#1758
AlexanderShoshin wants to merge 7 commits into
apache:masterfrom
AlexanderShoshin:ZEPPELIN-1787

Conversation

@AlexanderShoshin
Copy link
Copy Markdown
Contributor

What is this PR for?

This PR will add an example of batch processing with Flink to Zeppelin tutorial notebooks. There are no any Flink notebooks in the tutorial at the moment.

What type of PR is it?

Improvement

What is the Jira issue?

ZEPPELIN-1787

How should this be tested?

You should open Using Flink for batch processing notebook from the Zeppelin Tutorial folder and run all paragraphs one by one

Questions:

  • Does the licenses files need update? - no
  • Is there breaking changes for older versions? - no
  • Does this needs documentation? - no

@tae-jun
Copy link
Copy Markdown
Contributor

tae-jun commented Dec 14, 2016

Link for the note on ZeppelinHub

For convenience :)

@AlexanderShoshin Is this link right?

@AlexanderShoshin
Copy link
Copy Markdown
Contributor Author

Yes, but ZeppelinHub does not show all the paragraphs. I will try to find out the reason.

@AhyoungRyu
Copy link
Copy Markdown
Contributor

AhyoungRyu commented Dec 15, 2016

@AlexanderShoshin Thanks for your contribution!
While I was just quickly looking through this example note in my Zeppelin, couple of things were noticed.

  • AFAIK, OSX doesn't have wget by default so the OSX users might need to install by themselves. So I would suggest you to use curl instead of wget to download datasets. (curl is built-in command I guess)

  • I saw you set the location of dataset to tmp/. It can occur sth link "permission denied" error for that dir in Zeppelin like below.

Caused by: org.apache.flink.runtime.JobException: 
Creating the input splits caused an error: 
File /tmp/flights98.csv does not exist or the user running Flink ('ahyoungryu') has insufficient permissions to access it.

@zjffdu
Copy link
Copy Markdown
Contributor

zjffdu commented Dec 15, 2016

How about refer this note in flink.md ?

@AlexanderShoshin
Copy link
Copy Markdown
Contributor Author

@AhyoungRyu, thank you for your notes!

  • I've changed wget to curl as you suggested.
  • As for the error message with "permission denied" I think it is because %sh paragraphs with data download instructions did not finish correct. Each paragraph downloads for about 70 MB of data and unpack it then. They might have finished by timeout. In this case you don't have the /tmp/flights98.csv file at all. A added a recomendation to the notebook to increase shell.command.timeout.millisecs setting. It helped in my case.
  • We can't store data sets in "home" folder because %sh and %flink may be run by different users. So they may have different home folders. The /tmp/ folder is a common folder which normally can be accessed by each user. For the case of limited access I've added chmod 666 /tmp/flights<YY>.csv command for each csv set. Maybe there is a better solution for this issue. I will be glad to receive any suggestions :)

@AlexanderShoshin
Copy link
Copy Markdown
Contributor Author

@tae-jun, I found that ZeppelinHub can't display new notebooks correctly. It is because the note.json structure was changed after ZEPPELIN-212 was merged (two weeks ago). Now notebook has a results attribute (instead of result) to store paragraph results and it seems that ZeppelinHub can't see it.

@AlexanderShoshin
Copy link
Copy Markdown
Contributor Author

@zjffdu, should we do this in current PR or make a new issue for the documentation improving?

@AhyoungRyu
Copy link
Copy Markdown
Contributor

Thank you for the such precise explanation. Tested again and it works well!
BTW some markdown paragraphs are not shown like below, so i didn't noticed that there are some description in here(I need to click "show editor" to check what it contains).

screen shot 2016-12-16 at 2 43 53 pm

It would be better to show the other description paragraph's result by default like first paragraph does :)

@AlexanderShoshin
Copy link
Copy Markdown
Contributor Author

Oh, it should not looks like this. It might be another problem of new notebook json structure. I will correct this.

@AlexanderShoshin
Copy link
Copy Markdown
Contributor Author

I've converted notebook to 0.6.2 format. So it should be displayed correctly now.
ZeppelinHub vew also works.

@AhyoungRyu
Copy link
Copy Markdown
Contributor

Thanks for your quick update!
Since #1780 is trying to update the all tutorial notes' format to the latest one, having older format of note can't be best solution I think. Not sure, but as you said, I assume there is another problem \w new note json format itself.(If so, we need to fix this. But in another PR :D ) #1780 also has same problem as I left a comment in there.

@AhyoungRyu
Copy link
Copy Markdown
Contributor

@AlexanderShoshin Actually it was my bad. I checked again and this commit is perfectly working. Please ignore my last comment and sorry for the confusion.

@bzz
Copy link
Copy Markdown
Member

bzz commented Dec 20, 2016

Is there any other feedback, or was everything addressed and we shall merge it now?

Saw @zjffdu 's

How about refer this note in flink.md ?

@AlexanderShoshin do you think it's worth to update the interpreter documentation as well under ./docs, to mention this example notebook?

Other than that, looks great to me!

@AlexanderShoshin
Copy link
Copy Markdown
Contributor Author

I am confused a bit :)
@AhyoungRyu, do I need to drop my last commit or not?

@AhyoungRyu
Copy link
Copy Markdown
Contributor

AhyoungRyu commented Dec 20, 2016

@AlexanderShoshin Right that would be better. So sorry about that 😭

@AlexanderShoshin
Copy link
Copy Markdown
Contributor Author

AlexanderShoshin commented Dec 20, 2016

@bzz, I am not sure. We already have a Word Count example there to describe Flink usage. Where should we place a link to this example?

@rawkintrevo
Copy link
Copy Markdown
Contributor

rawkintrevo commented Dec 20, 2016

@AlexanderShoshin I think what @zjffdu and @bzz meant was that you could possibly talk about this notebook in some where like docs/interpreter/flink.md which would make it for new users to find it.

@AhyoungRyu
Copy link
Copy Markdown
Contributor

@AlexanderShoshin Seems the below two things need to be done before merge

  1. revert the last commit 9013620

You can simply do it in your local by

please make sure you are on branch ZEPPELIN-1787 first 
$ git checkout ZEPPELIN-1787

and check the last commit is "convert notebook to 0.6.2 format" or not
$ git log 

if so,
$ git reset HEAD^ --hard 

check again it's converted well, it should be pointing this commit: "add download instruction, change "wget" to "curl""
$ git log

push with forced updated! 
$ git push origin -f ZEPPELIN-1787
  1. add Flink docs link to the tutorial as @rawkintrevo said.

Please feel free to ping me if you need any help!

@AlexanderShoshin
Copy link
Copy Markdown
Contributor Author

Sorry, I was not able to work on this issue during last weeks.
@AhyoungRyu, I will make the corrections soon.
Thanks

@AlexanderShoshin
Copy link
Copy Markdown
Contributor Author

I've added a new commit to convert the notebook to 0.7.0 format because I found that 9013620 commit has several conflicts in json-file.
I've also merged master to avoid conflicts in flink.md.

Comment thread docs/interpreter/flink.md Outdated

## How to test it's working
In example, by using the [Zeppelin notebook](https://www.zeppelinhub.com/viewer/notebooks/aHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL05GTGFicy96ZXBwZWxpbi1ub3RlYm9va3MvbWFzdGVyL25vdGVib29rcy8yQVFFREs1UEMvbm90ZS5qc29u) is from Till Rohrmann's presentation [Interactive data analysis with Apache Flink](http://www.slideshare.net/tillrohrmann/data-analysis-49806564) for Apache Flink Meetup.
You can try the [Flink usage example](http://localhost:8080/#/notebook/2C35YU814) from the tutorial folder or a word count example, by using the [Zeppelin notebook](https://www.zeppelinhub.com/viewer/notebooks/aHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL05GTGFicy96ZXBwZWxpbi1ub3RlYm9va3MvbWFzdGVyL25vdGVib29rcy8yQVFFREs1UEMvbm90ZS5qc29u) from Till Rohrmann's presentation [Interactive data analysis with Apache Flink](http://www.slideshare.net/tillrohrmann/data-analysis-49806564) for Apache Flink Meetup.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure adding link localhost:8080 is good idea, since there might be some ppl using another host & port. "You can try Flink usage example located under Zeppelin Tutorial folder." will be enough I guess :)

@AhyoungRyu
Copy link
Copy Markdown
Contributor

@AlexanderShoshin Checked again and it looks nice. Except the minor suggestion, LGTM. Thanks for your effort and the awesome tutorial note!

@AlexanderShoshin
Copy link
Copy Markdown
Contributor Author

@AhyoungRyu, thanks for the note. I've removed an url to the notebook.

@AhyoungRyu
Copy link
Copy Markdown
Contributor

AhyoungRyu commented Jan 11, 2017

@AlexanderShoshin Thanks! Will Merge into both master and branch-0.7 if there are no more comments on this.

@asfgit asfgit closed this in 0da08d1 Jan 12, 2017
asfgit pushed a commit that referenced this pull request Jan 12, 2017
### What is this PR for?
This PR will add an example of batch processing with Flink to Zeppelin tutorial notebooks. There are no any Flink notebooks in the tutorial at the moment.

### What type of PR is it?
Improvement

### What is the Jira issue?
[ZEPPELIN-1787](https://issues.apache.org/jira/browse/ZEPPELIN-1787)

### How should this be tested?
You should open `Using Flink for batch processing` notebook from the `Zeppelin Tutorial` folder and run all paragraphs one by one

### Questions:
* Does the licenses files need update? - **no**
* Is there breaking changes for older versions? - **no**
* Does this needs documentation? - **no**

Author: Alexander Shoshin <Alexander_Shoshin@epam.com>

Closes #1758 from AlexanderShoshin/ZEPPELIN-1787 and squashes the following commits:

83cbffb [Alexander Shoshin] remove localhost url
5255e17 [Alexander Shoshin] Merge branch 'master' into ZEPPELIN-1787
0b9df56 [Alexander Shoshin] add a link for this notebook to Zeppelin documentation
593c47d [Alexander Shoshin] convert notebook to 0.7.0 format
9013620 [Alexander Shoshin] convert notebook to 0.6.2 format
fe2a39e [Alexander Shoshin] add download instruction, change "wget" to "curl"
f64b60a [Alexander Shoshin] [ZEPPELIN-1787] Add an example of Flink Notebook

(cherry picked from commit 0da08d1)
Signed-off-by: ahyoungryu <ahyoungryu@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants