[GitRepository] Use multi-process file locking for git actions by NickLaMuro · Pull Request #19074 · ManageIQ/manageiq

NickLaMuro · 2019-07-29T23:11:34Z

Adds GitRepository#git_transaction method as well as the following private helper methods and constants:

GitRepository::LOCKFILE_DIR  # Where the locks are stored on the appliance

repo = GitRepository.new
repo.git_lock_filename       # name of the lockfile for this record
repo.acquire_git_lock        # fetches a lock for the instance (one per)
repo.release_git_lock        # releases existing lock for other processes

GitRepository#git_transaction is meant to be a multi-purpose method for dealing with any file system changes so that only one process is able to interact with it at a time. According to the docs (ri File#flock), File#flock(FILE::LOCK_EX) should block until the lock is available for the process, so the first process to acquire the lock will process first.

The method is also designed so that nested GitRepository#git_transaction calls will no-op and not release the lock pre-maturely. This is important for running multiple actions that by design need to happen by a single process, but the sub actions are also configured to use the lock.

Implementation of this method will be done in a followup commit, but the method is also meant to be a public interface so that when existing methods implement this method, locks will work just fine with user defined git_transaction blocks as well as with the GitRepository public API.

Links

Addressing "git clone not processes safe" concern from RFE
AndibleRunner for EmbeddedAnsible RFE: [RFC] EmbeddedAnsible with ansible-runner-based implementation manageiq-design#45

Steps for Testing/QA

¯\(°_o)/¯

But in all seriousness, this is really hard (impossible) to test since it requires running two processes and hitting a process timing where this would become an issue. That said, hopefully it doesn't cause the existing specs to break.

While not ideal, I have put together a test script for this:

https://gist.github.com/NickLaMuro/20dc0795410abf448402c54f707a59dc

Running that seemed to prove that things were working as expected, in addition to the Thread based test I provided in a comment below:

#19074 (comment)

In `app/models/git_repository.rb` when using `#with_worktree`, there is both the `def worktree` method available, and the `yield`'d `worktree` variable that can be access depending on how it was invoked. Prior to this commit, it was being done in two ways: # Accessing the variable via the private instance method def method_1 with_worktree do worktree # do stuff end end ##### OR ##### # Accessing the variable via the block variable defined def method_2 with_worktree do |worktree| worktree # do other stuff end end This simply makes the uses of `with_worktree` in this class consistent with eachother, and makes favors the more commonly used form (this was the only instance of the "`method_1`" form).

NickLaMuro · 2019-07-29T23:14:05Z

app/models/git_repository.rb

    @worktree ||= begin
-      clone_repo unless Dir.exist?(directory_name)
-      fetch_worktree
+      git_transaction do


Unsure if this is needed... I don't think GitRepository#worktree is ever accessed without GitRepository#with_worktree, but maybe I am missing something.

If nothing else, it at least tests that the "top level only" transaction thing works (the test suite should exercise this code) from EmbeddedAnsible::AutomationManager::ConfigurationScriptSource).

I think only the clone line + directory check needs the lock around it. I'd be ok if the Dir.exist? check lived inside clone_repo method so that you can encapsulate the lock in that one method.

NickLaMuro · 2019-07-29T23:15:33Z

@miq-bot add_label core, embedded ansible

(not sure if this should be a bug or an enhancement...)

@Fryguy and @carbonin please review

carbonin

This makes git repos process safe, but not thread safe. Should we also wrap the acquire_git_lock method in a mutex?

app/models/git_repository.rb

NickLaMuro · 2019-07-30T22:47:18Z

This makes git repos process safe, but not thread safe. Should we also wrap the acquire_git_lock method in a mutex?

@carbonin after some research, I think this process already makes it thread safe in the "process" of making it "process safe":

class FileLock
  def self.id
    if STDOUT.tty?
      @thread_colors ||= {}
      @current_id    ||= 30

      @thread_colors[Thread.current.object_id] ||= "\e[#{@current_id += 1}m#{Thread.current.object_id}\e[0m"
    else
      Thread.current.object_id
    end
  end

  def self.run
    puts "  > #{id} starting to run..."
    sleep 1

    File.open ".lockfile", "w" do |lockfile|
      lockfile.flock(File::LOCK_EX)
      puts "  > #{id} has obtained the lock!"
      sleep 1
      lockfile.flock(File::LOCK_UN)
      puts "  > #{id} has released the lock!"
    end
  end
end

puts "starting..."

threads = []
3.times do
  threads << Thread.new { FileLock.run }
end

threads.each(&:join)

puts "completed!"

$ ruby flock_thread_safety.rb
starting...
  > 70269252667060 starting to run...
  > 70269252667200 starting to run...
  > 70269252667340 starting to run...
  > 70269252667200 has obtained the lock!
  > 70269252667200 has released the lock!
  > 70269252667340 has obtained the lock!
  > 70269252667340 has released the lock!
  > 70269252667060 has obtained the lock!
  > 70269252667060 has released the lock!
completed!

NickLaMuro · 2019-07-31T01:03:54Z

@Fryguy @carbonin when ever you have time, I think I am pretty much set with this one.

I have provided some tests scripts and run info exercising this, but not sure it is worth trying to add to the test suite (I can if you want).

Also, my concern from my last commit is still unanswered:

GitRepository#worktree probably doesn't need #git_transaction since it should only be called from #with_worktree, but want to confirm with others.

Honestly, doesn't seem to be hurting anything, but curious if you have strong opinions on it staying. Once a decision has been decided on, I will update that commit and un-[WIP]

Edit: Oh yeah, I also ran across a concern about using File#flock on NFS:

https://www.ruby-forum.com/t/process-based-locks-for-files/66865/2

Though there isn't much context or proof around that statement, but maybe something I should try with one of my above scripts.

Fryguy · 2019-08-01T19:54:33Z

app/models/git_repository.rb


+  # NOTE:  Please make sure to use this whenever you implement a method that is
+  # modifying the `.git` data (clone, fetch, checking, etc.) so that we can be
+  # sure this happens safely between processes.


Rugged handles locking internally, so 2 separate instances of a Rugged::Repo doing a fetch will not collide with each other. The only thing we need to handle are clones because that's occurring outside of a "repo" context.

I think if there are multiple processes looking at the same repo, I'd also like to not serialize them, allowing them to execute git stuff in parallel.

Well this would have been good to know prior to doing all this work...

Okay, "salty Nick" has left the building...

Does it make sense to then reuse the locking code that is in GitWorktree/Rugged (honestly still not fully up to speed on which does it currently):

https://github.com/ManageIQ/manageiq/blob/master/lib/git_worktree.rb#L366-L391

or stick with what I have done here?

I think the linked locking bits would still have to be after the clone, so it doesn't help us in this case.

app/models/git_repository.rb

Fryguy · 2019-08-02T19:44:35Z

app/models/git_repository.rb

  def with_worktree
-    handling_worktree_errors do
-      yield worktree
+    git_transaction do


So this one should be dropped, IMO.

I still don't think this should lock here as this effectively serializes every operation, and we only want to serialize the clones

And also defines the following private helper methods and constants: - `LOCKFILE_DIR` # Where the locks are stored - `#git_lock_filename` # name of the lockfile for this record - `#acquire_git_lock` # fetches a lock for the instance (one per) - `#release_git_lock` # releases existing lock for other processes `#git_transaction` is meant to be a multi-purpose method for dealing with any file system changes so that only one process is able to interact with it at a time. According to the docs (`ri File#flock`), `File#flock(FILE::LOCK_EX)` should block until the lock is available for the process, so the first process to acquire the lock will process first. The method is also designed so that nested `#git_transaction` calls will no-op and not release the lock pre-maturely. This is important for running multiple actions that by design need to happen by a single process, but the sub actions are also configured to use the lock. Implementation of this method will be done in a followup commit, but the method is also meant to be a public interface so that when existing methods implement this method, locks will work just fine with user defined `git_transaction` blocks as well as with GitRepository method API.

NickLaMuro · 2019-08-05T17:43:04Z

@carbonin @Fryguy Sorry for the delay on this, but I did want to test this properly (manually) before I got back. Changed 3 things:

Removed the .unlink comment
Added a comment amount thread safety issue (when sharing an instance)
Made the @git_lock an attr_reader

In regards to the @git_lock change to an attr_reader, it worked as expected when running my test script with RUBYOPT="-w", and there even is a good example of this not being done properly with decendent_loader.rb in the resulting screenshot:

(and I saved you from the almost 800 lines of warnings preceding this output...)

I did also quickly confirm that not using the attr_reader and calling @git_lock.nil? directly will cause the warning to be displayed, so thank you for the suggestion.

carbonin · 2019-08-05T21:56:27Z

Looks good to me. Will let @Fryguy have the last say here.

Fryguy · 2019-08-06T00:50:22Z

The two things I was requesting were not done yet...that is, remove the git_transaction from with_worktree and only do git_transaction from within clone_repo (also putting the Dir.exists? call inside of clone_repo.

I want to avoid the locking overhead for every operation when it's only needed for when two processes compete for a clone.

NickLaMuro · 2019-08-06T01:02:41Z

I want to avoid the locking overhead for every operation when it's only needed for when two processes compete for a clone.

I was under the impression after our (out of band) meeting on Friday that we had come to the consensus that it was not a problem leaving it in as is, but apparently not.

Will change in a bit.

NickLaMuro · 2019-08-06T01:15:54Z

@Fryguy comments addressed

Changed `#clone_repo` to `#clone_repo_if_missing`, and encapsulated the logic that existed in `#worktree` (dir check) in it as well. Also wrapped the method in a `#git_transaction`, which makes sure only one process handles this at a time.

miq-bot · 2019-08-06T01:25:42Z

Checked commits NickLaMuro/manageiq@f136c1a~...b38bfb3 with ruby 2.4.6, rubocop 0.69.0, haml-lint 0.20.0, and yamllint 1.10.0
2 files checked, 0 offenses detected
Everything looks fine. 🏆

NickLaMuro commented Jul 29, 2019

View reviewed changes

miq-bot added wip core core/embedded ansible labels Jul 29, 2019

NickLaMuro force-pushed the ansible-runner-use-flock-with-git-repositories branch from edb7a82 to 064af63 Compare July 30, 2019 05:12

carbonin suggested changes Jul 30, 2019

View reviewed changes

app/models/git_repository.rb Outdated Show resolved Hide resolved

app/models/git_repository.rb Outdated Show resolved Hide resolved

app/models/git_repository.rb Outdated Show resolved Hide resolved

NickLaMuro force-pushed the ansible-runner-use-flock-with-git-repositories branch 3 times, most recently from 4731afb to 4b2e99d Compare July 31, 2019 00:59

NickLaMuro requested a review from carbonin July 31, 2019 01:04