Skip to content

[GitRepository] Use multi-process file locking for git actions#19074

Merged
carbonin merged 3 commits intoManageIQ:masterfrom
NickLaMuro:ansible-runner-use-flock-with-git-repositories
Aug 6, 2019
Merged

[GitRepository] Use multi-process file locking for git actions#19074
carbonin merged 3 commits intoManageIQ:masterfrom
NickLaMuro:ansible-runner-use-flock-with-git-repositories

Conversation

@NickLaMuro
Copy link
Member

@NickLaMuro NickLaMuro commented Jul 29, 2019

Adds GitRepository#git_transaction method as well as the following private helper methods and constants:

GitRepository::LOCKFILE_DIR  # Where the locks are stored on the appliance

repo = GitRepository.new
repo.git_lock_filename       # name of the lockfile for this record
repo.acquire_git_lock        # fetches a lock for the instance (one per)
repo.release_git_lock        # releases existing lock for other processes

GitRepository#git_transaction is meant to be a multi-purpose method for dealing with any file system changes so that only one process is able to interact with it at a time. According to the docs (ri File#flock), File#flock(FILE::LOCK_EX) should block until the lock is available for the process, so the first process to acquire the lock will process first.

The method is also designed so that nested GitRepository#git_transaction calls will no-op and not release the lock pre-maturely. This is important for running multiple actions that by design need to happen by a single process, but the sub actions are also configured to use the lock.

Implementation of this method will be done in a followup commit, but the method is also meant to be a public interface so that when existing methods implement this method, locks will work just fine with user defined git_transaction blocks as well as with the GitRepository public API.

Links

Steps for Testing/QA

¯\(°_o)/¯

But in all seriousness, this is really hard (impossible) to test since it requires running two processes and hitting a process timing where this would become an issue. That said, hopefully it doesn't cause the existing specs to break.

While not ideal, I have put together a test script for this:

https://gist.github.com/NickLaMuro/20dc0795410abf448402c54f707a59dc

Running that seemed to prove that things were working as expected, in addition to the Thread based test I provided in a comment below:

#19074 (comment)

In `app/models/git_repository.rb` when using `#with_worktree`, there is
both the `def worktree` method available, and the `yield`'d `worktree`
variable that can be access depending on how it was invoked.  Prior to
this commit, it was being done in two ways:

     # Accessing the variable via the private instance method
     def method_1
       with_worktree do
         worktree
         # do stuff
       end
     end

     ##### OR #####

     # Accessing the variable via the block variable defined
     def method_2
       with_worktree do |worktree|
         worktree
         # do other stuff
       end
     end

This simply makes the uses of `with_worktree` in this class consistent
with eachother, and makes favors the more commonly used form (this was
the only instance of the "`method_1`" form).
@worktree ||= begin
clone_repo unless Dir.exist?(directory_name)
fetch_worktree
git_transaction do
Copy link
Member Author

@NickLaMuro NickLaMuro Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure if this is needed... I don't think GitRepository#worktree is ever accessed without GitRepository#with_worktree, but maybe I am missing something.

If nothing else, it at least tests that the "top level only" transaction thing works (the test suite should exercise this code) from EmbeddedAnsible::AutomationManager::ConfigurationScriptSource).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only the clone line + directory check needs the lock around it. I'd be ok if the Dir.exist? check lived inside clone_repo method so that you can encapsulate the lock in that one method.

@NickLaMuro
Copy link
Member Author

@miq-bot add_label core, embedded ansible

(not sure if this should be a bug or an enhancement...)

@Fryguy and @carbonin please review

@NickLaMuro NickLaMuro force-pushed the ansible-runner-use-flock-with-git-repositories branch from edb7a82 to 064af63 Compare July 30, 2019 05:12
Copy link
Member

@carbonin carbonin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes git repos process safe, but not thread safe. Should we also wrap the acquire_git_lock method in a mutex?

@NickLaMuro
Copy link
Member Author

This makes git repos process safe, but not thread safe. Should we also wrap the acquire_git_lock method in a mutex?

@carbonin after some research, I think this process already makes it thread safe in the "process" of making it "process safe":

class FileLock
  def self.id
    if STDOUT.tty?
      @thread_colors ||= {}
      @current_id    ||= 30

      @thread_colors[Thread.current.object_id] ||= "\e[#{@current_id += 1}m#{Thread.current.object_id}\e[0m"
    else
      Thread.current.object_id
    end
  end

  def self.run
    puts "  > #{id} starting to run..."
    sleep 1

    File.open ".lockfile", "w" do |lockfile|
      lockfile.flock(File::LOCK_EX)
      puts "  > #{id} has obtained the lock!"
      sleep 1
      lockfile.flock(File::LOCK_UN)
      puts "  > #{id} has released the lock!"
    end
  end
end

puts "starting..."

threads = []
3.times do
  threads << Thread.new { FileLock.run }
end

threads.each(&:join)

puts "completed!"
$ ruby flock_thread_safety.rb
starting...
  > 70269252667060 starting to run...
  > 70269252667200 starting to run...
  > 70269252667340 starting to run...
  > 70269252667200 has obtained the lock!
  > 70269252667200 has released the lock!
  > 70269252667340 has obtained the lock!
  > 70269252667340 has released the lock!
  > 70269252667060 has obtained the lock!
  > 70269252667060 has released the lock!
completed!

@NickLaMuro NickLaMuro force-pushed the ansible-runner-use-flock-with-git-repositories branch 3 times, most recently from 4731afb to 4b2e99d Compare July 31, 2019 00:59
@NickLaMuro
Copy link
Member Author

NickLaMuro commented Jul 31, 2019

@Fryguy @carbonin when ever you have time, I think I am pretty much set with this one.

I have provided some tests scripts and run info exercising this, but not sure it is worth trying to add to the test suite (I can if you want).

Also, my concern from my last commit is still unanswered:

GitRepository#worktree probably doesn't need #git_transaction since it should only be called from #with_worktree, but want to confirm with others.

Honestly, doesn't seem to be hurting anything, but curious if you have strong opinions on it staying. Once a decision has been decided on, I will update that commit and un-[WIP]

Edit: Oh yeah, I also ran across a concern about using File#flock on NFS:

https://www.ruby-forum.com/t/process-based-locks-for-files/66865/2

Though there isn't much context or proof around that statement, but maybe something I should try with one of my above scripts.

@NickLaMuro NickLaMuro requested a review from carbonin July 31, 2019 01:04

# NOTE: Please make sure to use this whenever you implement a method that is
# modifying the `.git` data (clone, fetch, checking, etc.) so that we can be
# sure this happens safely between processes.
Copy link
Member

@Fryguy Fryguy Aug 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rugged handles locking internally, so 2 separate instances of a Rugged::Repo doing a fetch will not collide with each other. The only thing we need to handle are clones because that's occurring outside of a "repo" context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if there are multiple processes looking at the same repo, I'd also like to not serialize them, allowing them to execute git stuff in parallel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this would have been good to know prior to doing all this work...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, "salty Nick" has left the building...

Does it make sense to then reuse the locking code that is in GitWorktree/Rugged (honestly still not fully up to speed on which does it currently):

https://github.com/ManageIQ/manageiq/blob/master/lib/git_worktree.rb#L366-L391

or stick with what I have done here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the linked locking bits would still have to be after the clone, so it doesn't help us in this case.

def with_worktree
handling_worktree_errors do
yield worktree
git_transaction do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this one should be dropped, IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think this should lock here as this effectively serializes every operation, and we only want to serialize the clones

And also defines the following private helper methods and constants:

- `LOCKFILE_DIR`         # Where the locks are stored
- `#git_lock_filename`   # name of the lockfile for this record
- `#acquire_git_lock`    # fetches a lock for the instance (one per)
- `#release_git_lock`    # releases existing lock for other processes

`#git_transaction` is meant to be a multi-purpose method for dealing
with any file system changes so that only one process is able to
interact with it at a time.  According to the docs (`ri File#flock`),
`File#flock(FILE::LOCK_EX)` should block until the lock is available for
the process, so the first process to acquire the lock will process
first.

The method is also designed so that nested `#git_transaction` calls will
no-op and not release the lock pre-maturely.  This is important for
running multiple actions that by design need to happen by a single
process, but the sub actions are also configured to use the lock.

Implementation of this method will be done in a followup commit, but the
method is also meant to be a public interface so that when existing
methods implement this method, locks will work just fine with user
defined `git_transaction` blocks as well as with GitRepository method
API.
@NickLaMuro NickLaMuro force-pushed the ansible-runner-use-flock-with-git-repositories branch from 4b2e99d to 58dce4b Compare August 5, 2019 17:32
@NickLaMuro
Copy link
Member Author

NickLaMuro commented Aug 5, 2019

@carbonin @Fryguy Sorry for the delay on this, but I did want to test this properly (manually) before I got back. Changed 3 things:

  • Removed the .unlink comment
  • Added a comment amount thread safety issue (when sharing an instance)
  • Made the @git_lock an attr_reader

In regards to the @git_lock change to an attr_reader, it worked as expected when running my test script with RUBYOPT="-w", and there even is a good example of this not being done properly with decendent_loader.rb in the resulting screenshot:

git_transaction_example_with_dash_w

(and I saved you from the almost 800 lines of warnings preceding this output...)

I did also quickly confirm that not using the attr_reader and calling @git_lock.nil? directly will cause the warning to be displayed, so thank you for the suggestion.

@NickLaMuro NickLaMuro changed the title [WIP][GitRepository] Use multi-process file locking for git actions [GitRepository] Use multi-process file locking for git actions Aug 5, 2019
@miq-bot miq-bot removed the wip label Aug 5, 2019
@carbonin
Copy link
Member

carbonin commented Aug 5, 2019

Looks good to me. Will let @Fryguy have the last say here.

@Fryguy
Copy link
Member

Fryguy commented Aug 6, 2019

The two things I was requesting were not done yet...that is, remove the git_transaction from with_worktree and only do git_transaction from within clone_repo (also putting the Dir.exists? call inside of clone_repo.

I want to avoid the locking overhead for every operation when it's only needed for when two processes compete for a clone.

@NickLaMuro
Copy link
Member Author

I want to avoid the locking overhead for every operation when it's only needed for when two processes compete for a clone.

I was under the impression after our (out of band) meeting on Friday that we had come to the consensus that it was not a problem leaving it in as is, but apparently not.

Will change in a bit.

@NickLaMuro NickLaMuro force-pushed the ansible-runner-use-flock-with-git-repositories branch from 58dce4b to 631e488 Compare August 6, 2019 01:15
@NickLaMuro
Copy link
Member Author

@Fryguy comments addressed

Changed `#clone_repo` to `#clone_repo_if_missing`, and encapsulated the
logic that existed in `#worktree` (dir check) in it as well.

Also wrapped the method in a `#git_transaction`, which makes sure only
one process handles this at a time.
@NickLaMuro NickLaMuro force-pushed the ansible-runner-use-flock-with-git-repositories branch from 631e488 to b38bfb3 Compare August 6, 2019 01:25
@miq-bot
Copy link
Member

miq-bot commented Aug 6, 2019

Checked commits NickLaMuro/manageiq@f136c1a~...b38bfb3 with ruby 2.4.6, rubocop 0.69.0, haml-lint 0.20.0, and yamllint 1.10.0
2 files checked, 0 offenses detected
Everything looks fine. 🏆

@carbonin carbonin assigned carbonin and unassigned Fryguy Aug 6, 2019
@carbonin carbonin merged commit 8d41f64 into ManageIQ:master Aug 6, 2019
@carbonin carbonin added this to the Sprint 118 Ending Aug 19, 2019 milestone Aug 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants