Locks, a cautionary tale

I recently tried to develop a data model that would allow for reliable locking without strong consistency. That attempt failed, but rather than throw the idea away entirely, I decided to include it here to illustrate the complexities of coping with eventual consistency.

Basic premise: multiple workers may be assigned data sets to process, but each data set should be assigned to no more than one worker.

In the absence of strong consistency, the best an application can do is use the pr and pw parameters (primary read, primary write) with a value of quorum or n_val.

So, common features to all of these models:

Lock for any given data set is a known key
Value is a unique identifier for the worker

Lock, a first draft

Sequence

Bucket: allow_mult=false

Worker reads with pr=quorum to determine whether a lock exists
If it does, move on to another data set
If it doesn't, create a lock with pw=quorum
Process data set to completion
Remove the lock

Failure scenario

Worker #1 reads the non-existent lock
Worker #2 reads the non-existent lock
Worker #2 writes its ID to the lock
Worker #2 starts processing the data set
Worker #1 writes its ID to the lock
Worker #1 starts processing the data set

Lock, a second draft

Bucket: allow_mult=false

Sequence

Worker reads with pr=quorum to determine whether a lock exists
If it does, move on to another data set
If it doesn't, create a lock with pw=quorum
Read lock again with pr=quorum
If the lock exists with another worker's ID, move on to another data set
Process data set to completion
Remove the lock

Failure scenario

Worker #1 reads the non-existent lock
Worker #2 reads the non-existent lock
Worker #2 writes its ID to the lock
Worker #2 reads the lock and sees its ID
Worker #1 writes its ID to the lock
Worker #1 reads the lock and sees its ID
Both workers process the data set

If you've done any programming with threads before, you'll recognize this as a common problem with non-atomic lock operations.

Lock, a third draft

Bucket: allow_mult=true

Sequence

Worker reads with pr=quorum to determine whether a lock exists
If it does, move on to another data set
If it doesn't, create a lock with pw=quorum
Read lock again with pr=quorum
If the lock exists with another worker's ID or the lock contains siblings, move on to another data set
Process data set to completion
Remove the lock

Failure scenario

Worker #1 reads the non-existent lock
Worker #2 reads the non-existent lock
Worker #2 writes its ID to the lock
Worker #1 writes its ID to the lock
Worker #1 reads the lock and sees a conflict
Worker #2 reads the lock and sees a conflict
Both workers move on to another data set

Lock, a fourth draft

Bucket: allow_mult=true

Sequence

Worker reads with pr=quorum to determine whether a lock exists
If it does, move on to another data set
If it doesn't, create a lock with pw=quorum and a timestamp
Read lock again with pr=quorum
If the lock exists with another worker's ID or the lock contains siblings and its timestamp is not the earliest, move on to another data set
Process data set to completion
Remove the lock

Failure scenario

Worker #1 reads the non-existent lock
Worker #2 reads the non-existent lock
Worker #2 writes its ID and timestamp to the lock
Worker #2 reads the lock and sees its ID
Worker #2 starts processing the data set
Worker #1 writes its ID and timestamp to the lock
Worker #1 reads the lock and sees its ID with the lowest timestamp
Worker #1 starts processing the data set

At this point I may hear you grumbling: clearly worker #2 would have the lower timestamp because it attempted its write first, and thus #1 would skip the data set and try another one.

Even if both workers are running on the same server (and thus probably have timestamps that can be compared)¹, perhaps worker #1 started its write earlier but contacted an overloaded cluster member that took longer to process the request.

The same failure could occur if, instead of using timestamps for comparison purposes, the ID of each worker was used for comparison purposes. All it takes is for one worker to read its own write before another worker's write request arrives.

Conclusion

We can certainly come up with algorithms that limit the number of times that multiple workers tackle the same job, but I have yet to find one that guarantees exclusion.

What I found surprising about this exercise is that none of the failure scenarios required some of the odder edge conditions that can cause unexpected outcomes. For example, pw=quorum writes will return an error to the client if 2 of the primary servers are not available, but the value will still be written to the 3rd server and 2 fallback servers. Predicting what will happens the next time someone tries to read the key is challenging.

None of these algorithms required deletion of a value, but that is particularly fraught with peril. It's not difficult to construct scenarios where deleted values reappear if servers are temporarily unavailable during the deletion request.

An important part of any distributed systems discussion is the fact that clocks are inherently untrustworthy, and thus calling any event the "last" to occur is an exercise in faith: faith that a system clock wasn't set backwards in time by ntpd or an impatient system administrator, faith that all clocks involved are in perfect sync.

And, to be clear, perfect synchronization of clocks across multiple systems is unattainable. Google is attempting to solve this by purchasing large quantities of atomic and GPS clocks at great expense, but that only narrows the margin of error. ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Locks, a cautionary tale

Lock, a first draft

Sequence

Failure scenario

Lock, a second draft

Sequence

Failure scenario

Lock, a third draft

Sequence

Failure scenario

Lock, a fourth draft

Sequence

Failure scenario

Conclusion

FilesExpand file tree

cautionary.md

Latest commit

History

cautionary.md

File metadata and controls

Locks, a cautionary tale

Lock, a first draft

Sequence

Failure scenario

Lock, a second draft

Sequence

Failure scenario

Lock, a third draft

Sequence

Failure scenario

Lock, a fourth draft

Sequence

Failure scenario

Conclusion

Footnotes