Skip to content

TLS test failures#2331

Merged
mrocklin merged 8 commits into
dask:masterfrom
mrocklin:test-ci
Nov 9, 2018
Merged

TLS test failures#2331
mrocklin merged 8 commits into
dask:masterfrom
mrocklin:test-ci

Conversation

@mrocklin

@mrocklin mrocklin commented Nov 1, 2018

Copy link
Copy Markdown
Member

Do not merge

@mrocklin

mrocklin commented Nov 2, 2018

Copy link
Copy Markdown
Member Author

@jcrist @mariusvniekerk do either of you have time to check out these test failures? They're breaking tests currently.

@mrocklin mrocklin changed the title A trivial change to test CI TLS test failures Nov 3, 2018
@mrocklin

mrocklin commented Nov 3, 2018

Copy link
Copy Markdown
Member Author

Also cc @pitrou

This was referenced Nov 3, 2018
@pitrou

pitrou commented Nov 3, 2018

Copy link
Copy Markdown
Member

You should be able to pinpoint when the failures started to occur? Is it on a specific changeset?

@mrocklin

mrocklin commented Nov 3, 2018

Copy link
Copy Markdown
Member Author

I haven't tried doing this yet. They didn't show up during any PR and now they fail consistently, so I don't think that they snuck in as some intermittent failures do.

My guess is that some dependency changed upstream.

@pitrou

pitrou commented Nov 3, 2018

Copy link
Copy Markdown
Member

Do you conda list as part of CI jobs? That's very useful for cases like this, so that you can compare versions.

@pitrou

pitrou commented Nov 3, 2018

Copy link
Copy Markdown
Member

By the way, I don't think those failures are intermittent. They should even be reproducible locally, with the right software setup.

@mariusvniekerk

Copy link
Copy Markdown
Collaborator

@pitrou Pretty certain these are related to openssl 1.1 vs 1.0

@pitrou

pitrou commented Nov 8, 2018

Copy link
Copy Markdown
Member

@mariusvniekerk do you have a proof of that? e.g. conda list output?

@mariusvniekerk

mariusvniekerk commented Nov 8, 2018

Copy link
Copy Markdown
Collaborator

yep, and the test failures are around TLS 1.3 ciphers

See: https://travis-ci.org/dask/distributed/jobs/449255237#L523

Our issue seems to be mozilla/server-side-tls#191 (comment)

@pitrou

pitrou commented Nov 8, 2018

Copy link
Copy Markdown
Member

The cipher tests should be easy to relax a bit, but the certification reject test shouldn't fail.

@mariusvniekerk

mariusvniekerk commented Nov 8, 2018

Copy link
Copy Markdown
Collaborator

it does "warn" about it failing, so it is aware of that but doesn't raise?

except EnvironmentError as e:
# The handshake went wrong, log and ignore
logger.warning("Listener on %r: TLS handshake failed with remote %r: %s",
self.listen_address, address,
getattr(e, "real_error", None) or e)
else:
raise gen.Return(stream)

@pitrou

pitrou commented Nov 8, 2018

Copy link
Copy Markdown
Member

Right, but it should raise in that case. The warning is in the case the handshake error is because of transient network issues. But if the certificate is reject, that's a permanent error and it should raise an exception. So you must inspect the exception to decide what to do with it.

@mariusvniekerk

mariusvniekerk commented Nov 8, 2018

Copy link
Copy Markdown
Collaborator

We should change the test to catch and reraise the exception (within that test) if present to see what it is, we could still be raising something but pytest just says it's not EnvironmentError. Want to just rule that one out first.
I'm not sure if i have push rights to this pr

@mrocklin

mrocklin commented Nov 8, 2018 via email

Copy link
Copy Markdown
Member Author

@mrocklin

mrocklin commented Nov 8, 2018

Copy link
Copy Markdown
Member Author

Tests seem to pass. The Python 2 failure was due to the print(..., flush=True) syntax in debug code.

@mrocklin

mrocklin commented Nov 8, 2018

Copy link
Copy Markdown
Member Author

To the extent that I'm able to judge this seems fine to me. There were a couple of test failures but I think that they are intermittent and unrelated.

@mariusvniekerk is there any additional work that you think should be done here?

Comment thread distributed/comm/tcp.py
stream = yield client.connect(ip, port,
max_buffer_size=MAX_BUFFER_SIZE,
**kwargs)
# Under certain circumstances tornado will have a closed connnection with an error and not raise

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would be worth reporting the issue to the Tornado bug tracker.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrocklin I think this gets us working for now.
@pitrou I don't think tornado is tested a python built against openssl 1.1+ atm?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, but it's still worth reporting. I generally find it annoying when people don't report issues upstream.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mrocklin

mrocklin commented Nov 8, 2018

Copy link
Copy Markdown
Member Author

I removed my initial dummy commit and added a couple small style changes. I hope that now that the first commit belongs to @mariusvniekerk that the "Squash and merge" feature will appropriately assign ownership to him

@mrocklin

mrocklin commented Nov 8, 2018

Copy link
Copy Markdown
Member Author

This does mean a force push though, so you may want to git reset --hard before pulling/pushing if there is any more to do

@mrocklin mrocklin merged commit 3fed641 into dask:master Nov 9, 2018
mrocklin pushed a commit that referenced this pull request Nov 9, 2018
* Check ciphers less aggresively for TLS1.3 ciphers

When python is compiled against openssl 1.1 we have TLS1.3 ciphers in the
mix as well.  We should treat these appropriately

* Attempt to mitigate the failed connection case

* Introduced a FatalCommClosedError

This deals with TLS errors that cannot be recovered.

* Relaxed forced cipher test
@mrocklin mrocklin deleted the test-ci branch November 9, 2018 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants