Skip to content

[ZEPPELIN-1515] Use Hadoop-FileSystem api to implement NotebookRepo#2529

Closed
prabhjyotsingh wants to merge 2 commits into
apache:masterfrom
prabhjyotsingh:ZEPPELIN-1515
Closed

[ZEPPELIN-1515] Use Hadoop-FileSystem api to implement NotebookRepo#2529
prabhjyotsingh wants to merge 2 commits into
apache:masterfrom
prabhjyotsingh:ZEPPELIN-1515

Conversation

@prabhjyotsingh
Copy link
Copy Markdown
Contributor

@prabhjyotsingh prabhjyotsingh commented Aug 14, 2017

What is this PR for?

With this PR, I attempt to use Hadoop-filesystem API to implement NotebookRepo, which in my opinion is very generic in nature, as with this we will be able to support multiple scheme of the URI like local file system file:///, Hadoop file system hdfs://localhost:8020, S3 etc. So far I have tested this only with local and HDFS.

What type of PR is it?

[Improvement]

Todos

  • - Periodic refresh of kerberos ticket.

What is the Jira issue?

How should this be tested?

In zeppelin-site.xml enable these two configs; to write into HDFS.

<property>
  <name>zeppelin.notebook.storage</name>
  <value>org.apache.zeppelin.notebook.repo.HDFSNotebookRepo</value>
  <description>notebook persistence layer implementation</description>
</property>

<property>
  <name>zeppelin.hadoop.uri</name>
  <value>hdfs://localhost:8020</value>
  <description>The scheme of the URI determines a configuration property name, fs.scheme.class whose value names the FileSystem class.
  e.g. `hdfs://localhost:8020`, or `file:///`
  </description>
</property>

Screenshots (if appropriate)

Questions:

  • Does the licenses files need an update? N/A
  • Is there breaking changes for older versions? N/A
  • Does this needs documentation? Yes

@prabhjyotsingh prabhjyotsingh changed the title use Hadoop-FileSystem api to implement NotebookRepo [ZEPPELIN-1515] Use Hadoop-FileSystem api to implement NotebookRepo Aug 14, 2017
@prabhjyotsingh
Copy link
Copy Markdown
Contributor Author

@zjffdu @jongyoul @felixcheung @halidhuseynov Please help review this.

@prabhjyotsingh
Copy link
Copy Markdown
Contributor Author

PING

</property>

<property>
<name>kerberos.principal</name>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be prefix with zeppelin.hadoop? there might be other uses of kerberos keytab/principal?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like the key refresh stuff for shell interpreter...

<name>zeppelin.hadoop.uri</name>
<value>hdfs://localhost:8020</value>
<description>The scheme of the URI determines a configuration property name, fs.scheme.class whose value names the FileSystem class.
e.g. `hdfs://localhost:8020`, or `file:///`
Copy link
Copy Markdown
Member

@felixcheung felixcheung Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this might be confused with webhdfs server url? I'm not sure if there is a better name for it than hadoop.uri?


private Long getTimeAsMs(String time) {
if (time == null) {
LOG.error("Cannot convert to time value.", time);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this print out null properly?

return refreshInterval;
}

private Long getTimeAsMs(String time) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to add some tests for this function

.schedule(this, getKerberosRefreshInterval(), TimeUnit.MILLISECONDS);
} else {
kinitFailCount++;
LOG.info("runKerberosLogin failed for " + kinitFailCount + " time(s).");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOG.warn?

if (path == null || path.trim().length() == 0) {
absolutePath = fileSystem.getHomeDirectory().toString();
} else if (path.startsWith("/")) {
absolutePath = fileSystem.getHomeDirectory().toString() + path;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a HDFS path starts with /, eg. /root - shouldn't the absolutePath be /root?

String fileName = f.getPath().getName();
if (fileName.startsWith(".")
|| fileName.startsWith("#")
|| fileName.startsWith("~")) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Leemoonsoo
Copy link
Copy Markdown
Member

How about the compatibility of Hadoop filesystem API against different hadoop versions?
Does it work with various versions without rebuild Zeppelin?

@zjffdu
Copy link
Copy Markdown
Contributor

zjffdu commented Aug 19, 2017

@prabhjyotsingh What's the difference of this PR compared with #2455 ?

@prabhjyotsingh
Copy link
Copy Markdown
Contributor Author

@zjffdu yes you are right, I originally thought I have a different idea, but now I see, this will eventually be that. Will close this then.

@prabhjyotsingh prabhjyotsingh deleted the ZEPPELIN-1515 branch February 25, 2018 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants