Skip to content

[SPARK-25111][BUILD] increment kinesis client/producer & aws-sdk versions#22099

Closed
steveloughran wants to merge 1 commit into
apache:masterfrom
steveloughran:cloud/SPARK-25111-kinesis
Closed

[SPARK-25111][BUILD] increment kinesis client/producer & aws-sdk versions#22099
steveloughran wants to merge 1 commit into
apache:masterfrom
steveloughran:cloud/SPARK-25111-kinesis

Conversation

@steveloughran

@steveloughran steveloughran commented Aug 14, 2018

Copy link
Copy Markdown
Contributor

This PR has been superceded by #22081

What changes were proposed in this pull request?

Increment the kinesis client, producer and transient AWS SDK versions to a more recent release.

This is to help with the move off bouncy castle of #21146 and #22081; the goal is that moving up to the new SDK will allow a JVM with unlimited JCE but without bouncy castle to work with Kinesis endpoints.

Why this specific set of artifacts? it syncs up with the 1.11.271 AWS SDK used by hadoop 3.0.3, hadoop-3.1. and hadoop 3.1.1; that's been stable for the uses there (s3, STS, dynamo).

How was this patch tested?

Running all the external/kinesis-asl tests via maven with java 8.121 & unlimited JCE, without bouncy castle (#21146); default endpoint of us-west.2. Without this SDK update I was getting http cert validation errors, with it they went away.

This PR is not ready without

  • Jenkins test runs to see what it is happy with
  • more testing: repeated runs, another endpoint
  • looking at the new deprecation warnings and selectively addressing them (the AWS SDKs are pretty aggressive about deprecation, but sometimes they increase the complexity of the client code or block some codepaths off completely)

…k to match.

Change-Id: Ic2d12a07d273bd1b6fc4c681075070f22ed1e44c
@steveloughran

Copy link
Copy Markdown
Contributor Author

As noted in #22146; stripping off bouncy castle and upgrading the SDK worked. But a local test run of just this patch brought up the same error seen in #22081

WithoutAggregationKinesisStreamSuite:
- KinesisUtils API
- RDD generation
- basic operation
- custom message handling *** FAILED ***
  The code passed to eventually never returned normally. Attempted 20 times over 2.092846262916667 minutes. Last failure message: collected.synchronized[Boolean](KinesisStreamTests.this.convertToEqualizer[scala.collection.mutable.HashSet[Int]](collected).===(modData.toSet[Int])(scalactic.this.Equality.default[scala.collection.mutable.HashSet[Int]])) was false
  Data received does not match data sent. (KinesisStreamSuite.scala:230)
- Kinesis read with custom configurations
- split and merge shards in a stream
- failure recovery *** FAILED ***
  The code passed to eventually never returned normally. Attempted 105 times over 2.0055098129 minutes. Last failure message: isCheckpointPresent was true, but 0 was not greater than 10. (KinesisStreamSuite.scala:398)

That wasn't a full clean build, so let's see what Jenkins says and some more test runs tomorrow. It could just be this is all showing up some flakiness in the test case. At the very least, some more details on the failure might be good.

@steveloughran

Copy link
Copy Markdown
Contributor Author

@srowen @budde @ajfabbri

@SparkQA

SparkQA commented Aug 14, 2018

Copy link
Copy Markdown

Test build #94728 has finished for PR 22099 at commit e79e5b9.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen srowen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK in principle, pending tests.

@SparkQA

SparkQA commented Aug 14, 2018

Copy link
Copy Markdown

Test build #4246 has finished for PR 22099 at commit e79e5b9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Aug 14, 2018

Copy link
Copy Markdown

Test build #4247 has started for PR 22099 at commit e79e5b9.

@steveloughran

Copy link
Copy Markdown
Contributor Author

Local kinesis tests with both -Phadoop-3.1, -Phadoop-2.7 & Phadoop-3.1 -Dhadoop.version=3.1.1 are all working here (with bouncycastle, unlimited JCE in JVM).

I'm updating the #21146 PR with this patch to see what happens with the combination in Jenkins of no bouncycastle, updated Kinesis.

Test run failure here was org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite.offset recovery from kafka; hard to see how it relates

@dongjoon-hyun

Copy link
Copy Markdown
Member

Retest this please.

@SparkQA

SparkQA commented Aug 15, 2018

Copy link
Copy Markdown

Test build #94776 has finished for PR 22099 at commit e79e5b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen

srowen commented Aug 15, 2018

Copy link
Copy Markdown
Member

To be clear you think this passed because it still uses jets3t and that still brings in BC? Then we can maybe merge this and rebase the other change to find out. This update won't have changed that situation with strong crypto being required right?

@steveloughran

Copy link
Copy Markdown
Contributor Author

To be clear you think this passed because it still uses jets3t and that still brings in BC?
correct

Then we can maybe merge this and rebase the other change to find out.
correct

This update won't have changed that situation with strong crypto being required right?

don't know. What it did do was stop my local test runs without bouncy castle failing with errors about certificate validation.

This patch is a good thing to do anyway, because it's good to stay somewhat current with the AWS releases (more chance of issues being addressed, reduced cost of future migrations). So it can be merged in and then the problem of getting #22081's test run to work addressed after.

I reopened #21146 & applied this patched to it, to see what Jenkins did there. The overall test runs come out as failing -hard to point to any related cause, but the Kinesis ones do all pass: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94769/testReport/org.apache.spark.streaming.kinesis/

I'm going to close that one again to avoid confusion about which of the "remove jets3t" patches people should be looking at; once the kinesis update is merged in you'll need to retest your #22081 PR and let's see what Jenkins says there

@srowen

srowen commented Aug 15, 2018

Copy link
Copy Markdown
Member

Merged to master

@asfgit asfgit closed this in 4d8ae0d Aug 15, 2018
@steveloughran

Copy link
Copy Markdown
Contributor Author

thanks

@steveloughran steveloughran deleted the cloud/SPARK-25111-kinesis branch August 15, 2018 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants