Skip to content
This repository was archived by the owner on May 18, 2022. It is now read-only.
This repository was archived by the owner on May 18, 2022. It is now read-only.

[RFC] EmbeddedAnsible with ansible-runner-based implementation #45

@Fryguy

Description

@Fryguy
Architecture

General approach

The current AWX implementation work by creating a provider that talks to an AWX instance, and uses the provider refresh to pull data into the database. CRUD operations on AWX objects go through the provider API, where the object is created in AWX, and then brought in via EMS refresh. After that callers use the ManageIQ models to do whatever they need to with the data.

As such, all of the ManageIQ callers use the provider API as an abstraction layer, and we can take advantage of that. Instead of have provider CRUD operations go to a provider, we can instead write the data directly into the database tables as if a "refresh" had occurred immediately.

Repositories

A repository is created as a ManageIQ::Providers::EmbeddedAnsible::AutomationManager::ConfigurationScriptSource (< ConfigurationScriptSource). For the implementation in this PR, the git repos are cloned into Rails.root.join("tmp/git_repos/:id"). This works great for single appliance, but will not work as well for federated appliances, nor appliances that can't access the internet directly. As such a different design is needed, which is below in the git repo management section.

Once the repository is cloned, then the playbooks are each synced as a ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Playbook (< ConfigurationScriptPayload < ConfigurationScriptBase (table name configuration_scripts). In this PR I've also pulled in the "name" attribute as the playbook description, though I'm not sure if this is correct or not.

Service Template

When designing a service, the service template is saved as a ManageIQ::Providers::EmbeddedAnsible::AutomationManager::ConfigurationScript which is a subclass of ConfigurationScript, which is a subclass of ConfigurationScriptBase (table name configuration_scripts).

CONFUSION NOTE: Both services templates and playbooks are stored in the same table, but with different subclasses and different column usage. Additionally confusing is that unlike playbooks which create a subclass with that native term, the class here is ConfigurationScript instead of the native term JobTemplate, but some of the relationships use the term job_template instead.

For the purposes of this PoC, I've stored some of the options for the service template in the variables column, but I don't believe that is the correct way to do it. We will have to go back to the original design to see where the Tower provider stores those values during refresh.

Service execute

When an ansible service template is ordered, a ServiceTemplateProvisionRequest (< MiqRequest) is started, which goes through automate, and ultimately an instance of a ServiceAnsiblePlaybook (< Service) is executed. In the general Service flow there are 2 main methods that need to be implemented, execute and check_completed. In the execute method a ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job (< OrchestrationStack) is created as a resource for this service, and "launched", moving on to the check_completed step.

Launching ansible-runner

For launching ansible-runner, we are using the ManageIQ::Providers::AnsibleRunnerWorkflow class which will eventually use Ansible::Runner helper class. (Note: this workflow class was created as a helper for provider authors to create ansible based operations, however, the code itself is not provider specific and this code should be moved out of the providers namespace and into the Ansible::Runner namespace instead).

CONFUSION NOTE: The workflow class is a subclass of ::Job, which is our generic state machine using MiqTasks. This is completely unrelated to ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job, which is just a resource representation for the service.

The AnsibleRunnerWorkflow, being a self-contained Job will launch ansible-runner with json output, asynchronously poll if the ansible-runner execution has completed, and once it has detected completion, it will grab the results, store them in the MiqTask context, and cleanup the ansible-runner execution temp directory.

Service check_completed

In the meantime, the check_completed step of the ServiceAnsiblePlaybook is run every so often. In this implementation, the MiqTask associated with the AnsibleRunnerWorkflow is being watched for completion. Once it has been marked as finished, then the service can move on with its post-execution steps.

Services page

The services page shows the details of the ServiceAnsiblePlaybook, and the user can drill into the provision details. One of those details is the ansible stdout. In the AWX-based implementation, this was one of the few places where the database records were not used, and instead an asynchronous call would be made to AWX directly to fetch the stdout on demand. In the new ansible-runner design we don't have that option. For now, in this implementation, we happen to have this information already stored in the AnsibleRunnerWorkflow's associated MiqTask, and since we have a relationship between the ServiceAnsiblePlaybook, and the MiqTask, we can get the data directly from the database. We may not want to store this information in the MiqTask permanently, so a better design might be need which I'll elaborate on in the Ansible stdout section

The stdout is extracted from the stored json records, however it has ANSI character codes for terminal colors embedded. In the previous implementation, one could ask AWX for the HTML version, but we don't have that in this implementation. So, instead we use the terminal ruby gem, which converts the raw terminal output to HTML replacing ANSI escape sequences with css classes. For this PoC, I've use the default CSS file that comes with the terminal gem, which styles the HTML by wrapping it in a div and scoping that style to the wrapper div. We will likely want the UI team to have the freedom to style this directly, so instead we can forego the built-in CSS for styles directly in our ManageIQ stylesheets.

Installing ansible-runner
  • On Mac
brew install ansible python
pip3 install ansible-runner
source /usr/local/Cellar/ansible/2.7.10/libexec/bin/activate && pip3 install psutil && deactivate
  • On Fedora/CentOS
sudo wget -O /etc/yum.repos.d/ansible-runner.repo https://releases.ansible.com/ansible-runner/ansible-runner.el7.repo
sudo dnf install ansible-runner
git repo management

@mkanoor and I had started on a federated git repo management design back when we had the idea that the automate models would work better stored in git repos, thus allowing us to run them at any point in time as well as for history tracking, auditing, and reverting capabilities.

The premise was that an appliance would be given the git_owner role, which would behave much like the db_owner role. This appliance would allow internet access and thus could clone from public locations like github and/or private git instances. A record would be put into the git_repositories table, so that if we needed to failover the appliance we could re-clone.

All other appliances, if they needed to access something about the git repository, would git clone/fetch from the appliance with the git_owner role. This would allow non-internet connected appliances to get at the data in an on-demand fashion.

Some of these classes already exist, such as the GitRepository, GitReference, GitBranch, and GitTag models, as well as the GitWorktree class which manages the on-disk repositories using the rugged gem.

The work that still needs to occur is to

  • complete these classes
  • expose the git protocol from the appliance, likely through Apache, but with some sort of server to server authentication (perhaps similar to how we do MiqServer.api_system_auth_token_for_region?)
  • have a way to identify the appliance with the git_owner role, likely in a similar fashion to MiqRegion#remote_ui_miq_server

Once these are completed, we can ensure a git repo by checking if our on-disk git exists, to which we can git clone from the git_owner appliance, or if it already exists but is not up-to-date (checked by comparing to the expected SHA stored in the git_repositories table), then git fetching from the git_owner appliance.

Additionally, this would allow us to support things like "Update on Launch", because we would know the expected SHA for launching and can ensure we use that SHA, so when doing an Update on Launch we git fetch first and update the expected SHA.

Extra-bonus, since all of this is done, @mkanoor and I will be able to realize our git-based automate design 😄

Seeding

I'm not sure we need to seed any more than what's in the PR (i.e. default credentials for "localhost"). The original code had to create defaults for a number of things in order to please AWX, but those aren't necessarily needed for the new implementation. Even so, we need to research each one of those. (cc @carbonin)

Ansible stdout

In this implementation ansible stdout is stored in the MiqTask and it's associated AnsibleRunnerWorkflow job. (cc @agrare) These stdouts can get really big, so it's probably best to only have it stored once. We probably also do not want to store it in MiqTask, as that class could get cleaned up eventually, so it's probably better to hang a binary_blob entry off of the ServiceAnsiblePlaybook instance.

Another complication here is how the UI is implemented, since this was originally a special casing for asynchronously fetching the stdout from AWX on-demand. In the original implementation, the backend code would start a special MiqTask specifically to get the output as HTML, and temporarily store it in the task. Then, the UI would wait_for_task, and when it was done delete the MiqTask.

None of this is needed anymore, and I think the backend code could be changed such that when the AnsibleRunnerWorkflow is completed, the data is extracted from the MiqTask, and stored as a binary_blob. Later, when the UI asks for the output, no MiqTask is needed as the data is already in the database and can just be served directly. Even better, this can probably be done as a normal controller action, where the controller just asks the model for the raw output and the TerminalToHtml call is done in the controller (since that's the more logical place to convert raw data to presentation HTML).

Automate methods that are playbooks directly (without the service/service catalog)

Automate methods that are playbooks directly can use the AnsiblePlaybookWorkflow directly. Unlike the Service modeling which had its own execute and check_completed callouts, the automate methods do not.


TODO

Credential management

TODO

This section will likely need UI work.

Some settings in the service, such as logging, verbosity

TODO

Using the embedded_ansible or perhaps automate role

TODO

Upgrades

TODO

Tests

TODO

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions