[GitRepository] Use multi-process file locking for git actions#19074
Conversation
In `app/models/git_repository.rb` when using `#with_worktree`, there is
both the `def worktree` method available, and the `yield`'d `worktree`
variable that can be access depending on how it was invoked. Prior to
this commit, it was being done in two ways:
# Accessing the variable via the private instance method
def method_1
with_worktree do
worktree
# do stuff
end
end
##### OR #####
# Accessing the variable via the block variable defined
def method_2
with_worktree do |worktree|
worktree
# do other stuff
end
end
This simply makes the uses of `with_worktree` in this class consistent
with eachother, and makes favors the more commonly used form (this was
the only instance of the "`method_1`" form).
app/models/git_repository.rb
Outdated
| @worktree ||= begin | ||
| clone_repo unless Dir.exist?(directory_name) | ||
| fetch_worktree | ||
| git_transaction do |
There was a problem hiding this comment.
Unsure if this is needed... I don't think GitRepository#worktree is ever accessed without GitRepository#with_worktree, but maybe I am missing something.
If nothing else, it at least tests that the "top level only" transaction thing works (the test suite should exercise this code) from EmbeddedAnsible::AutomationManager::ConfigurationScriptSource).
There was a problem hiding this comment.
I think only the clone line + directory check needs the lock around it. I'd be ok if the Dir.exist? check lived inside clone_repo method so that you can encapsulate the lock in that one method.
edb7a82 to
064af63
Compare
carbonin
left a comment
There was a problem hiding this comment.
This makes git repos process safe, but not thread safe. Should we also wrap the acquire_git_lock method in a mutex?
@carbonin after some research, I think this process already makes it thread safe in the "process" of making it "process safe": class FileLock
def self.id
if STDOUT.tty?
@thread_colors ||= {}
@current_id ||= 30
@thread_colors[Thread.current.object_id] ||= "\e[#{@current_id += 1}m#{Thread.current.object_id}\e[0m"
else
Thread.current.object_id
end
end
def self.run
puts " > #{id} starting to run..."
sleep 1
File.open ".lockfile", "w" do |lockfile|
lockfile.flock(File::LOCK_EX)
puts " > #{id} has obtained the lock!"
sleep 1
lockfile.flock(File::LOCK_UN)
puts " > #{id} has released the lock!"
end
end
end
puts "starting..."
threads = []
3.times do
threads << Thread.new { FileLock.run }
end
threads.each(&:join)
puts "completed!"$ ruby flock_thread_safety.rb
starting...
> 70269252667060 starting to run...
> 70269252667200 starting to run...
> 70269252667340 starting to run...
> 70269252667200 has obtained the lock!
> 70269252667200 has released the lock!
> 70269252667340 has obtained the lock!
> 70269252667340 has released the lock!
> 70269252667060 has obtained the lock!
> 70269252667060 has released the lock!
completed! |
4731afb to
4b2e99d
Compare
|
@Fryguy @carbonin when ever you have time, I think I am pretty much set with this one. I have provided some tests scripts and run info exercising this, but not sure it is worth trying to add to the test suite (I can if you want). Also, my concern from my last commit is still unanswered:
Honestly, doesn't seem to be hurting anything, but curious if you have strong opinions on it staying. Once a decision has been decided on, I will update that commit and un- Edit: Oh yeah, I also ran across a concern about using https://www.ruby-forum.com/t/process-based-locks-for-files/66865/2 Though there isn't much context or proof around that statement, but maybe something I should try with one of my above scripts. |
app/models/git_repository.rb
Outdated
|
|
||
| # NOTE: Please make sure to use this whenever you implement a method that is | ||
| # modifying the `.git` data (clone, fetch, checking, etc.) so that we can be | ||
| # sure this happens safely between processes. |
There was a problem hiding this comment.
Rugged handles locking internally, so 2 separate instances of a Rugged::Repo doing a fetch will not collide with each other. The only thing we need to handle are clones because that's occurring outside of a "repo" context.
There was a problem hiding this comment.
I think if there are multiple processes looking at the same repo, I'd also like to not serialize them, allowing them to execute git stuff in parallel.
There was a problem hiding this comment.
Well this would have been good to know prior to doing all this work...
There was a problem hiding this comment.
Okay, "salty Nick" has left the building...
Does it make sense to then reuse the locking code that is in GitWorktree/Rugged (honestly still not fully up to speed on which does it currently):
https://github.com/ManageIQ/manageiq/blob/master/lib/git_worktree.rb#L366-L391
or stick with what I have done here?
There was a problem hiding this comment.
I think the linked locking bits would still have to be after the clone, so it doesn't help us in this case.
app/models/git_repository.rb
Outdated
| def with_worktree | ||
| handling_worktree_errors do | ||
| yield worktree | ||
| git_transaction do |
There was a problem hiding this comment.
So this one should be dropped, IMO.
There was a problem hiding this comment.
I still don't think this should lock here as this effectively serializes every operation, and we only want to serialize the clones
And also defines the following private helper methods and constants: - `LOCKFILE_DIR` # Where the locks are stored - `#git_lock_filename` # name of the lockfile for this record - `#acquire_git_lock` # fetches a lock for the instance (one per) - `#release_git_lock` # releases existing lock for other processes `#git_transaction` is meant to be a multi-purpose method for dealing with any file system changes so that only one process is able to interact with it at a time. According to the docs (`ri File#flock`), `File#flock(FILE::LOCK_EX)` should block until the lock is available for the process, so the first process to acquire the lock will process first. The method is also designed so that nested `#git_transaction` calls will no-op and not release the lock pre-maturely. This is important for running multiple actions that by design need to happen by a single process, but the sub actions are also configured to use the lock. Implementation of this method will be done in a followup commit, but the method is also meant to be a public interface so that when existing methods implement this method, locks will work just fine with user defined `git_transaction` blocks as well as with GitRepository method API.
4b2e99d to
58dce4b
Compare
|
@carbonin @Fryguy Sorry for the delay on this, but I did want to test this properly (manually) before I got back. Changed 3 things:
In regards to the (and I saved you from the almost 800 lines of warnings preceding this output...) I did also quickly confirm that not using the |
|
Looks good to me. Will let @Fryguy have the last say here. |
|
The two things I was requesting were not done yet...that is, remove the git_transaction from with_worktree and only do git_transaction from within clone_repo (also putting the Dir.exists? call inside of clone_repo. I want to avoid the locking overhead for every operation when it's only needed for when two processes compete for a clone. |
I was under the impression after our (out of band) meeting on Friday that we had come to the consensus that it was not a problem leaving it in as is, but apparently not. Will change in a bit. |
58dce4b to
631e488
Compare
|
@Fryguy comments addressed |
Changed `#clone_repo` to `#clone_repo_if_missing`, and encapsulated the logic that existed in `#worktree` (dir check) in it as well. Also wrapped the method in a `#git_transaction`, which makes sure only one process handles this at a time.
631e488 to
b38bfb3
Compare
|
Checked commits NickLaMuro/manageiq@f136c1a~...b38bfb3 with ruby 2.4.6, rubocop 0.69.0, haml-lint 0.20.0, and yamllint 1.10.0 |

Adds
GitRepository#git_transactionmethod as well as the following private helper methods and constants:GitRepository#git_transactionis meant to be a multi-purpose method for dealing with any file system changes so that only one process is able to interact with it at a time. According to the docs (ri File#flock),File#flock(FILE::LOCK_EX)should block until the lock is available for the process, so the first process to acquire the lock will process first.The method is also designed so that nested
GitRepository#git_transactioncalls will no-op and not release the lock pre-maturely. This is important for running multiple actions that by design need to happen by a single process, but the sub actions are also configured to use the lock.Implementation of this method will be done in a followup commit, but the method is also meant to be a public interface so that when existing methods implement this method, locks will work just fine with user defined
git_transactionblocks as well as with theGitRepositorypublic API.Links
git clonenot processes safe" concern from RFEAndibleRunnerforEmbeddedAnsibleRFE: [RFC] EmbeddedAnsible with ansible-runner-based implementation manageiq-design#45Steps for Testing/QA
¯\(°_o)/¯
But in all seriousness, this is really hard (impossible) to test since it requires running two processes and hitting a process timing where this would become an issue. That said, hopefully it doesn't cause the existing specs to break.While not ideal, I have put together a test script for this:
https://gist.github.com/NickLaMuro/20dc0795410abf448402c54f707a59dc
Running that seemed to prove that things were working as expected, in addition to the
Threadbased test I provided in a comment below:#19074 (comment)