Skip to content

Fix thread safety race conditions#1

Closed
tdg5 wants to merge 1 commit intoextremeandy:masterfrom
tdg5:fix-thread-safety-issues
Closed

Fix thread safety race conditions#1
tdg5 wants to merge 1 commit intoextremeandy:masterfrom
tdg5:fix-thread-safety-issues

Conversation

@tdg5
Copy link
Copy Markdown

@tdg5 tdg5 commented May 28, 2020

Hello!

I'm sorry to inform you that this library is not thread-safe. There are race conditions around when and how many resources are created that cause the existing tests to fail when conditions are right (I run phoronix-test-suite to put my machine under heavy load which tends to exacerbate the issue).

I think these race conditions can cause some resources that are created to be lost and can cause more resources to be created than might be desirable. I didn't see more than maxNumResources resources created (though I wasn't paying close attention to this), but I did see many instances of 2x minNumResources being created when none were in use if the timing was right.

I believe these changes fix the problems I was encountering (though I still can't promise you the library is now thread-safe). With these changes in place, I was able to run the test suite hundreds of times under heavy load with out any failures.

I tried to do clever things to make things more thread-safe, but ultimately found a couple of locks were needed to prevent different threads from seeing partial state changes causing incorrect actions.

Ended up uglier than I might have liked, but I'm happy to discuss any ideas and answer any questions you might have.

Thanks!

@extremeandy
Copy link
Copy Markdown
Owner

extremeandy commented Sep 8, 2020

Hi, sorry for the slow reply, I somehow missed the notification for this.

I believe this commit fixes the issue of too many resources being created: 89d880e

I don't believe this was a thread-safety issue - the AsyncResourcePool class is using the actor pattern with ActionBlock<> and MaxDegreeOfParallelism: 1 (default behaviour) which ensures that non-thread safe objects like Queue are only ever acted on by a single thread. However I'm happy to be shown otherwise - just would like to know how it's possible that multiple threads could act on the queue simultaneously if they are constrained by the ActionBlock!

The only fields that may have contentious writes are _numResources and _numResourcesInUse and for these I use Interlocked to update their values.

Are you still able to cause the tests to fail after this commit?

https://www.nuget.org/packages/AsyncResourcePool/1.1.0

@tdg5
Copy link
Copy Markdown
Author

tdg5 commented Sep 16, 2020

Hi! I also apologize for the slow response.

It's been a while since I looked at this so my memory is a bit fuzzy. I think I'm no longer seeing inconsistencies around resource creation.

It's not obvious to me that it's as important, but I am seeing this test failure regularly:

  Error Message:
   Assert.Equal() Failure
Expected: Task<bool> { Status = WaitingForActivation }
Actual:   Task<bool> { Status = RanToCompletion }
  Stack Trace:
     at AsyncResourcePool.Tests.AsyncResourcePoolTests.SlowDisposingResources_ShouldNotBlock() in /home/danny/src/async-resource-pool/tests/AsyncResourcePoolTests.cs:line 327
--- End of stack trace from previous location where exception was thrown ---

(that line number is probably wrong, I have the test file doctored to try to reproduce resource creation inconsistencies)

But it's not clear to me that that test inconsistency is any reason to keep this PR open.

Thanks for the fix 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants