Skip to content

HADOOP-17125. Using snappy-java in SnappyCodec#2297

Merged
steveloughran merged 24 commits into
apache:trunkfrom
viirya:java-snappy
Oct 6, 2020
Merged

HADOOP-17125. Using snappy-java in SnappyCodec#2297
steveloughran merged 24 commits into
apache:trunkfrom
viirya:java-snappy

Conversation

@viirya

@viirya viirya commented Sep 10, 2020

Copy link
Copy Markdown
Member

See https://issues.apache.org/jira/browse/HADOOP-17125 for details.

Offline discussed with @dbtsai and submitted this based on #2201.

@@ -1,166 +0,0 @@
/*

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per #2201 (comment) Are those native code used in hadoop-mapreduce-client-nativetask? If so, we probably need to keep it now.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, because we remove native method in java files, I think we don't generate .h file needed for compilation: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2297/1/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt

[WARNING] /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2297/src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/snappy/SnappyDecompressor.c:32:10: fatal error: org_apache_hadoop_io_compress_snappy_SnappyDecompressor.h: No such file or directory
[WARNING]  #include "org_apache_hadoop_io_compress_snappy_SnappyDecompressor.h"
[WARNING]           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[WARNING] compilation terminated.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, I don't see they are used in hadoop-mapreduce-client-nativetask if I don't miss it. Let's wait the build and test.

@dbtsai

dbtsai commented Sep 10, 2020

Copy link
Copy Markdown
Member

Thanks @viirya for taking over my #2201 , and continue working on it.

@dbtsai

dbtsai commented Sep 11, 2020

Copy link
Copy Markdown
Member

The only test failure in

TestSnappyCompressorDecompressor.testSnappyDirectBlockCompression

, I guess it's because in SnappyDirectBlockCompression, the compressedByteBuffer is in read mode already, so we don't need to change it to read mode in the decompressBytesDirect().

@viirya

viirya commented Sep 11, 2020

Copy link
Copy Markdown
Member Author

@dbtsai Yeah, let me look at it today. Hope to pass all tests soon.

@viirya

viirya commented Sep 11, 2020

Copy link
Copy Markdown
Member Author

@sunchao I think all tests are passed. But there are two -1, do you know what it means?

@sunchao

sunchao commented Sep 11, 2020

Copy link
Copy Markdown
Member

@sunchao I think all tests are passed. But there are two -1, do you know what it means?

@viirya looks like the gcc compilation or check style failed - you can check test results for cc

@viirya

viirya commented Sep 11, 2020

Copy link
Copy Markdown
Member Author

@sunchao Thanks. I saw a check style failure:

./hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/snappy/TestSnappyCompressorDecompressor.java:355:    int[] size = { 4 * 1024, 64 * 1024, 128 * 1024, 1024 * 1024 };:18: '{' is followed by whitespace. [NoWhitespaceAfter]

But I don't change its style in the diff.

@viirya

viirya commented Sep 11, 2020

Copy link
Copy Markdown
Member Author

@sunchao

sunchao commented Sep 11, 2020

Copy link
Copy Markdown
Member

Yes. I also don't see any error in the log files. I think we can check Yetus repo to see how it decides whether it is a -1 or -0.

cc @jojochuang @aajisaka @steveloughran do you have any idea what caused the CI failure here?

@sunchao

sunchao commented Sep 11, 2020

Copy link
Copy Markdown
Member

BTW I think we no longer need the -Drequire.snappy flag and REQUIRE_SNAPPY anymore with this right?

@viirya

viirya commented Sep 11, 2020

Copy link
Copy Markdown
Member Author

BTW I think we no longer need the -Drequire.snappy flag and REQUIRE_SNAPPY anymore with this right?

Yes, I plan to remove them once we get rid of -1.

@viirya

viirya commented Sep 13, 2020

Copy link
Copy Markdown
Member Author

Checked the cc warnings and the related code. They are committed long time ago, e.g., 2014, and not touched here. Many of the cc warnings are warning: dynamic exception specifications are deprecated in C++11 [-Wdeprecated]. I guess it is either due to that we didn't check such warnings when the code was committed, or compilation tools upgrade? I think it is not caused by this change. Because we removed some .c and .h files, so the CI build triggered related building.

So I am not sure which one is good, fixing these compilation warnings, or ignoring them?

@sunchao

sunchao commented Sep 13, 2020

Copy link
Copy Markdown
Member

So I am not sure which one is good, fixing these compilation warnings, or ignoring them?

Yeah. Looks to me we can just ignore these for now and proceed to other things in this PR.

@viirya

viirya commented Sep 14, 2020

Copy link
Copy Markdown
Member Author

Looks like CI failed to fetch and install yetus? @sunchao do you know how we can re-trigger CI build and testing?

@sunchao

sunchao commented Sep 14, 2020

Copy link
Copy Markdown
Member

Just re-triggered the job let's see what happens

@viirya

viirya commented Sep 14, 2020

Copy link
Copy Markdown
Member Author

It seems still failed to fetch and install yetus, and not just this PR, other PRs also encountered it...

@viirya

viirya commented Sep 14, 2020

Copy link
Copy Markdown
Member Author

@sunchao Who we should let them know about the CI issue?

@sunchao

sunchao commented Sep 14, 2020

Copy link
Copy Markdown
Member

@viirya interesting ... I think you can send an email to the Hadoop dev list (common-dev@hadoop.apache.org, you may need to subscribe first).

@viirya

viirya commented Sep 15, 2020

Copy link
Copy Markdown
Member Author

OK, seems the CI is working now.

@viirya

viirya commented Sep 15, 2020

Copy link
Copy Markdown
Member Author

I have run a benchmark and compatibility test locally. I use SnappyCodec to write and read a ~200MB SequenceFile. Before and after this change, the performance is nearly the same.

For compatibility test, I write SequenceFile using two SnappyCodec and read it back using each other. The file can be read without problem. And the file size is also the same.

@sunchao

sunchao commented Oct 2, 2020

Copy link
Copy Markdown
Member

The style issue was fixed in the last run. The CI failed because of unit tests and ASF license (I don't really see the file jobTokenPassword). Seems neither is related to this PR.

@viirya

viirya commented Oct 2, 2020

Copy link
Copy Markdown
Member Author

Fixed another and last style issue. Checked with mvn checkstyle:check locally.

@steveloughran steveloughran left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yetus failures are all unrelated. One minor tweak suggested to that change on test reporting. I don't like ever losing stacks of nested exceptions, so if you are changing that code, just throw the AssertionError which fail() would normally do, with the caught exception as the cause. Not your fault, I know, but since you are there...

if (ex.getMessage() != null) {
fail(joiner.join(name, ex.getMessage()));
} else {
fail(joiner.join(name, ExceptionUtils.getStackTrace(ex)));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NPE is why toString() is what new code should do.
Why don't we just throw new AssertionError(name +ex, ex). That way, the stack trace doesn't get lost, which is something we never want to have happen,

@saintstack

Copy link
Copy Markdown
Contributor

If making a new PR, the ' compile' is redundant given its maven default?

The license failure is:

Lines that start with ????? in the ASF License  report indicate files that do not have an Apache license header:
 !????? /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2297/src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/jobTokenPassword

No harm fixing it as part of this patch... add the 'jobTokenPassword' from below in ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/pom.xml

       <plugin>
         <groupId>org.apache.rat</groupId>
         <artifactId>apache-rat-plugin</artifactId>
         <configuration>
           <excludes>
             <exclude>src/test/java/org/apache/hadoop/cli/data60bytes</exclude>
             <exclude>src/test/resources/job_1329348432655_0001-10.jhist</exclude>
             <exlude>**/jobTokenPassword</exclude>
           </excludes>
         </configuration>
       </plugin>

Otherwise patch is looking good to me.

@saintstack

Copy link
Copy Markdown
Contributor

The native compile complaints seem unrelated...

@viirya

viirya commented Oct 5, 2020

Copy link
Copy Markdown
Member Author

Thanks @steveloughran and @saintstack. Updated the diff based on your suggestions.

@hadoop-yetus

Copy link
Copy Markdown

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 0m 0s test4tests The patch appears to include 4 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 5m 36s Maven dependency ordering for branch
+1 💚 mvninstall 24m 2s trunk passed
+1 💚 compile 19m 49s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 compile 17m 5s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 checkstyle 2m 56s trunk passed
+1 💚 mvnsite 20m 56s trunk passed
+1 💚 shadedclient 14m 21s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 6m 29s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 7m 8s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+0 🆗 spotbugs 0m 45s Used deprecated FindBugs config; considering switching to SpotBugs.
+0 🆗 findbugs 0m 25s branch/hadoop-project no findbugs output file (findbugsXml.xml)
+0 🆗 findbugs 0m 23s branch/hadoop-project-dist no findbugs output file (findbugsXml.xml)
-0 ⚠️ patch 1m 7s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 35s Maven dependency ordering for patch
+1 💚 mvninstall 21m 15s the patch passed
+1 💚 compile 19m 18s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
-1 ❌ cc 19m 18s /diff-compile-cc-root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 generated 40 new + 123 unchanged - 40 fixed = 163 total (was 163)
+1 💚 golang 19m 18s the patch passed
+1 💚 javac 19m 18s the patch passed
+1 💚 compile 17m 9s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
-1 ❌ cc 17m 9s /diff-compile-cc-root-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.txt root-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu218.04-b01 with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu218.04-b01 generated 36 new + 127 unchanged - 36 fixed = 163 total (was 163)
+1 💚 golang 17m 9s the patch passed
+1 💚 javac 17m 9s the patch passed
+1 💚 checkstyle 2m 50s root: The patch generated 0 new + 140 unchanged - 3 fixed = 140 total (was 143)
+1 💚 mvnsite 17m 36s the patch passed
+1 💚 shellcheck 0m 0s There were no new shellcheck issues.
+1 💚 shelldocs 0m 18s There were no new shelldocs issues.
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 5s The patch has no ill-formed XML file.
+1 💚 shadedclient 14m 11s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 6m 25s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 7m 9s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+0 🆗 findbugs 0m 23s hadoop-project has no data from findbugs
+0 🆗 findbugs 0m 24s hadoop-project-dist has no data from findbugs
_ Other Tests _
-1 ❌ unit 587m 25s /patch-unit-root.txt root in the patch passed.
+1 💚 asflicense 1m 51s The patch does not generate ASF License warnings.
891m 20s
Reason Tests
Failed junit tests hadoop.yarn.applications.distributedshell.TestDistributedShell
hadoop.crypto.key.kms.server.TestKMS
hadoop.hdfs.TestFileChecksumCompositeCrc
hadoop.hdfs.server.balancer.TestBalancer
hadoop.hdfs.server.namenode.TestFileTruncate
hadoop.hdfs.TestDFSShell
hadoop.hdfs.TestFileChecksum
hadoop.tools.TestDistCpSystem
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2297/22/artifact/out/Dockerfile
GITHUB PR #2297
Optional Tests dupname asflicense shellcheck shelldocs compile javac javadoc mvninstall mvnsite unit shadedclient xml cc findbugs checkstyle golang
uname Linux 928304952c17 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 6ece640
Default Java Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2297/22/testReport/
Max. process+thread count 4090 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-project-dist hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient . U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2297/22/console
versions git=2.17.1 maven=3.6.0 shellcheck=0.4.6 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@steveloughran

Copy link
Copy Markdown
Contributor

No harm fixing it as part of this patch... add the 'jobTokenPassword' from below in ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/pom.xml

I think it's actually some test runner bug, really it should be cleaned up. But we can pull in the patch to shut it up.

@steveloughran

Copy link
Copy Markdown
Contributor

Ok, I'm happy too

+1, merging to trunk and branch-3.3

@steveloughran steveloughran merged commit c9ea344 into apache:trunk Oct 6, 2020
asfgit pushed a commit that referenced this pull request Oct 6, 2020
This switches the SnappyCodec to use the java-snappy codec, rather than the native one.

To use the codec, snappy-java.jar (from org.xerial.snappy) needs to be on the classpath.

This comesin as an avro dependency,  so it is already on the hadoop-common classpath,
as well as in hadoop-common/lib.
The version used is now managed in the hadoop-project POM; initially 1.1.7.7

Contributed by DB Tsai and Liang-Chi Hsieh

Change-Id: Id52a404a0005480e68917cd17f0a27b7744aea4e
@saintstack

Copy link
Copy Markdown
Contributor

Thanks for pushing this through @steveloughran +1 on master and branch-3.3.

@viirya

viirya commented Oct 6, 2020

Copy link
Copy Markdown
Member Author

@dbtsai

dbtsai commented Oct 6, 2020

Copy link
Copy Markdown
Member

Thanks all for helping and pushing this through! This will simplify how people deploy snappy native lib greatly.

@steveloughran

Copy link
Copy Markdown
Contributor

JIRA on apache is offline & updated -we need to remember to update that, including something in the release notes

@viirya

viirya commented Oct 6, 2020

Copy link
Copy Markdown
Member Author

Ok, got it. I will update release notes once it is back. Seems I cannot update Hadoop JIRA.

@viirya

viirya commented Oct 6, 2020

Copy link
Copy Markdown
Member Author

Looks like the JIRA is back now? https://issues.apache.org/jira/browse/HADOOP-17125

@steveloughran

Copy link
Copy Markdown
Contributor

JIRA closed, added a release note.

@sunchao

sunchao commented Oct 7, 2020

Copy link
Copy Markdown
Member

Thanks @steveloughran - could you assign the JIRA to @viirya ?

@steveloughran

Copy link
Copy Markdown
Contributor

@viirya ...what's your JIRA username?

@viirya

viirya commented Oct 9, 2020

Copy link
Copy Markdown
Member Author

@steveloughran Username is viirya too. Thanks.

@steveloughran

Copy link
Copy Markdown
Contributor

@viirya assigned JIRA to you. you are also free to assign any other Hadoop JIRAs to yourself...

@viirya

viirya commented Oct 12, 2020

Copy link
Copy Markdown
Member Author

@steveloughran Thank you! I tried to assign this ticket, but seems cannot do it.

@steveloughran

Copy link
Copy Markdown
Contributor

you needed to be listed in the project settings as someone with the right permissions. its done now

@DreamFlyCBX DreamFlyCBX left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to know how to use snappy-java in hadoop to verify that snappy's compression is available

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants