Move "Cloud Cron" cookbook from Wiki to the built-in docs
* Since Mistral Wiki has been retired we need to move all important info published there into the docs residing in the project repo. This patch moves the article published at https://wiki.openstack.org/wiki/Mistral/Cookbooks/AdministrationCloudCron to the docs. Change-Id: I4764fb3beba1ff05091a4ce04c93fe56d39528d5 Partial-Bug: #1779318
This commit is contained in:
parent
9fb213c315
commit
5c05636ee4
@ -1,4 +0,0 @@
|
||||
Mistral Cookbooks
|
||||
=================
|
||||
|
||||
- `Mistral for Administration (aka Cloud Cron) <https://wiki.openstack.org/wiki/Mistral/Cookbooks/AdministrationCloudCron>`_
|
340
doc/source/user/cookbooks/cloud_cron.rst
Normal file
340
doc/source/user/cookbooks/cloud_cron.rst
Normal file
@ -0,0 +1,340 @@
|
||||
===========================================
|
||||
Mistral for Administration (aka Cloud Cron)
|
||||
===========================================
|
||||
|
||||
Prerequisites
|
||||
=============
|
||||
|
||||
A reader should be familiar with basic Mistral concepts such as workflow,
|
||||
task, action, cron trigger and YAQL expression language. Please refer to
|
||||
the corresponding sections of :doc:`/user/index` to get more information
|
||||
on that.
|
||||
|
||||
Background
|
||||
==========
|
||||
|
||||
When it comes to managing IT infrastructure such as a cloud or a data
|
||||
center, system administrators typically need to solve a lot of tasks.
|
||||
To name just a few:
|
||||
|
||||
* Update Linux kernel or specific software on all or a subset of servers
|
||||
* Re-configure certain software on a subset of servers
|
||||
* Crawl data from a subset of servers and build a report based on this data
|
||||
* Check health of certain software on a subset of servers or health of
|
||||
servers themselves
|
||||
|
||||
It’s worth adding that any of the tasks listed above may need to be done
|
||||
periodically according to a specified schedule. Dealing with them would
|
||||
require a lot of human attention if not using any special software that
|
||||
would allow to automate it.
|
||||
|
||||
In this article we’ll take OpenStack cloud tenant as an example of IT
|
||||
infrastructure that a system administrator needs to manage and see how
|
||||
Mistral workflow service can be useful for addressing those cases and why
|
||||
it’s worthwhile to use exactly workflow technology.
|
||||
|
||||
Important aspects
|
||||
=================
|
||||
|
||||
So what does it take to solve any of the above problems? Let’s have a look
|
||||
at pretty simple task as upgrading Linux kernel on a single server.
|
||||
It requires the following:
|
||||
|
||||
* Download new linux kernel packages
|
||||
* Install packages
|
||||
* Reboot the server
|
||||
|
||||
|
||||
Looks pretty simple to do. However, things get more complicated when:
|
||||
|
||||
* We want to do this for multiple servers
|
||||
* We need to clearly see which servers have been successfully updated and
|
||||
which haven’t after this sequence is completed on all the servers
|
||||
* We need to run this sequence automatically on a periodic basis
|
||||
|
||||
|
||||
For example, if we want to do this kind of automation by just writing
|
||||
a script (as administrators usually do), whether it is a Shell or Python,
|
||||
we’ll quickly see that taking care of these aspects is pretty challenging
|
||||
because in order to do that efficiently it makes sense to process all the
|
||||
servers in parallel, and once all the servers have been processed send
|
||||
a notification with the information showing whether all is fine or there
|
||||
were issues occurred during some of the operations. Additionally, if a
|
||||
script running on a single machine that is responsible for solving this
|
||||
task just fails for whatever reason then the whole process of updating
|
||||
a hundred servers will not complete and end up in an unknown state.
|
||||
|
||||
.. image:: img/cloud_cron_updating_multiple_servers.png
|
||||
:alt: Updating multiple tenant servers
|
||||
|
||||
So that shows that we need to take care of at least:
|
||||
|
||||
* Parallel execution
|
||||
* Persistent state giving info about what happened with every server (at
|
||||
minimum, success or failure)
|
||||
* High availability to make sure the whole thing will complete
|
||||
* Notification mechanism so that we don’t have to check the status of the
|
||||
process manually
|
||||
|
||||
And, as a matter of fact, this should be repeated every time we need to do
|
||||
something similar. Notification mechanism is not a must if we always want
|
||||
to run this upgrade manually and it doesn’t take long. In case if a human
|
||||
doesn't control when it starts and/or it takes long then notifications
|
||||
become very important. That all actually means that we most likely need
|
||||
to use an external tool that would take care of these concerns. A workflow
|
||||
technology like Mistral workflow service is exactly the type of tool that
|
||||
can help to deal with those problems.
|
||||
|
||||
Mistral-based solution
|
||||
======================
|
||||
|
||||
Let’s now show how we can solve this kind of tasks with Mistral and explore
|
||||
in details how Mistral addresses aforementioned concerns.
|
||||
|
||||
Updating Linux kernel on all tenant VMs
|
||||
=======================================
|
||||
|
||||
As an example, let's see how we can upgrade Linux kernel version on all
|
||||
cloud tenant servers (virtual machines, or just VMs) assuming they all have
|
||||
Ubuntu installed on them. We'll also have some assumptions about how we
|
||||
access guest operating systems which we'll mention separately. In fact,
|
||||
those assumptions don't change much from overall approach perspective so
|
||||
that it remains applicable if we alter some details as using a different
|
||||
operating system, not Ubuntu.
|
||||
|
||||
This use case is fairly simple but it demonstrates the essential advantages
|
||||
of using a workflow technology.
|
||||
|
||||
Initial workflow
|
||||
================
|
||||
|
||||
The central Mistral concept is workflow so first of all, we need to create
|
||||
a Mistral workflow that contains the logic of updating Linux kernel on
|
||||
multiple tenant servers. Let’s create a text file named *update_kernel.yaml*
|
||||
in any convenient text editor:
|
||||
|
||||
::
|
||||
|
||||
---
|
||||
version: '2.0'
|
||||
|
||||
upgrade_kernel:
|
||||
input:
|
||||
- username: ubuntu
|
||||
- private_key_filename
|
||||
- gateway_host
|
||||
|
||||
tasks:
|
||||
get_hosts:
|
||||
action: nova.servers_list
|
||||
publish:
|
||||
hosts: <% task(get_hosts).result.select({ip => $.addresses.get($.addresses.keys().first()).where($.get("OS-EXT-IPS:type") = fixed).first().addr}).ip %>
|
||||
keep-result: false
|
||||
on-success: upgrade
|
||||
|
||||
upgrade:
|
||||
with-items: host in <% $.hosts %>
|
||||
action: std.ssh_proxied host=<% $.host %>
|
||||
input:
|
||||
host: <% $.host %>
|
||||
gateway_host: <% $.gateway_host %>
|
||||
username: <% $.username %>
|
||||
private_key_filename: <% $.private_key_filename %>
|
||||
cmd: "sudo apt-get update && sudo apt-get install linux-image-generic-lts-$(lsb_release -sc) -y && sudo reboot"
|
||||
|
||||
This is the simplest version of Mistral workflow that does what we need.
|
||||
Let’s see what it consists of. It has two task definitions: “get_hosts”
|
||||
and “upgrade”.
|
||||
|
||||
“get_hosts” calls Nova action “nova.servers_list” that returns information
|
||||
about all servers in a tenant as JSON list. What we really need here is
|
||||
to extract their IP addresses. In order to do that we declare “publish”
|
||||
clause that introduces a new variable in workflow context called “hosts”
|
||||
that will contain a list of IPs. YAQL expression used to extract IP
|
||||
addresses is pretty tricky here just for how Nova structures networking
|
||||
information.
|
||||
|
||||
NOTE: it’s easy to see in what form Nova returns info about a server
|
||||
just by running:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ mistral run-action nova.servers_get '{"server": "<server-id>"}'
|
||||
|
||||
It’s worth noting that since in Mistral a result of a task is a result
|
||||
of its action (or workflow) we use special task property “keep-result”
|
||||
assigned with “false” so that the result doesn’t get stored in workflow
|
||||
context. We do this just because we’re not interested in all information
|
||||
that Nova returns, only IPs are relevant. This makes sense to do because
|
||||
even if we have a tenant with 30 virtual servers all information about
|
||||
them returned by Nova will take ~100 KB of disk space.
|
||||
|
||||
Task “upgrade” is where the most interesting things happen. It leverages
|
||||
“with-items” functionality to iterate over a list of server IPs and ssh
|
||||
to each of the servers in order to upgrade kernel. Word “iterate” here
|
||||
doesn't mean though that processing is sequential. Conversely, here’s the
|
||||
place where Mistral runs kernel upgrade in parallel. Every action execution
|
||||
object for “std.ssh_proxied” is stored in database and keeps state and
|
||||
result of upgrade operation on a certain virtual server.
|
||||
|
||||
An attentive reader may have noticed suffix "proxied" in name of action
|
||||
"std.ssh_proxied" and asked "What does it mean? Why not just "std.ssh" which
|
||||
Mistral also has in its standard action pack?" So now we're getting back
|
||||
to the assumption about the way how we access guest operating system.
|
||||
Mistral, by default, can't really get secure shell access to guest VMs
|
||||
for how cloud isolates management network where all OpenStack services
|
||||
reside from guest networks. In fact, if a server doesn't have a floating
|
||||
IP then any service running in a management network can't get network
|
||||
access to that server, it is simply in a different network. In our
|
||||
particular example, we assume that at least one VM in a tenant has a
|
||||
floating IP address so that it can be used as an ssh-gateway through which
|
||||
we can actually ssh other VMs. That's why we're using special action called
|
||||
"std.ssh_proxied" where "proxied" means that we have a proxy VM to access
|
||||
all tenant VMs.
|
||||
|
||||
.. image:: img/ssh_proxied.png
|
||||
:alt: Ssh access through a gateway VM
|
||||
|
||||
Mistral is a distributed highly-available system and it’s designed not only
|
||||
to survive infrastructural failures but also keep its workflows running.
|
||||
That’s why we can make sure that such a process automated with a workflow
|
||||
service as Mistral will finish even in case of failures of control system
|
||||
components, which in our case Mistral engine and executors.
|
||||
|
||||
Adding notifications
|
||||
====================
|
||||
|
||||
What our workflow is missing is the ability to notify a cloud operator when
|
||||
kernel upgrade has complete on all servers. In order to do that we just need
|
||||
to add one more task, let’s call it “send_success_email”. The full workflow
|
||||
now would look like:
|
||||
|
||||
::
|
||||
|
||||
---
|
||||
version: '2.0'
|
||||
|
||||
upgrade_kernel:
|
||||
input:
|
||||
- username: ubuntu
|
||||
- private_key_filename
|
||||
- gateway_host
|
||||
- email_info: null # [to_email, from_email, smtp_server, smtp_password]
|
||||
|
||||
tasks:
|
||||
get_hosts:
|
||||
action: nova.servers_list
|
||||
publish:
|
||||
hosts: <% task(get_hosts).result.select({ip => $.addresses.get($.addresses.keys().first()).where($.get("OS-EXT-IPS:type") = fixed).first().addr}).ip %>
|
||||
keep-result: false
|
||||
on-success: upgrade
|
||||
|
||||
upgrade:
|
||||
with-items: host in <% $.hosts %>
|
||||
action: std.ssh_proxied
|
||||
input:
|
||||
host: <% $.host %>
|
||||
gateway_host: <% $.gateway_host %>
|
||||
username: <% $.username %>
|
||||
private_key_filename: <% $.private_key_filename %>
|
||||
cmd: "sudo apt-get update && sudo apt-get install linux-image-generic-lts-$(lsb_release -sc) -y && sudo reboot"
|
||||
on-success:
|
||||
- send_success_email: <% $.email_info != null %>
|
||||
|
||||
send_success_email:
|
||||
action: std.email
|
||||
input:
|
||||
subject: Linux kernel on tenant VMs successfully updated
|
||||
body: |
|
||||
Number of updated VMs: <% $.hosts.len() %>
|
||||
|
||||
-- Thanks
|
||||
from_addr: <% $.email_info.from_email %>
|
||||
to_addrs: [<% $.email_info.to_email %>]
|
||||
smtp_server: <% $.email_info.smtp_server %>
|
||||
smtp_password: <% $.email_info.smtp_password %>
|
||||
|
||||
Note that along with task we’ve also added “on-success” clause for “upgrade”
|
||||
task that defines a transition to task “send_success_email” on successful
|
||||
completion of “upgrade”. This transition is conditional: it only works if
|
||||
we passed data needed to send an email as an input parameter. That’s why
|
||||
this new version of workflow has a new input parameter called “email_info”.
|
||||
It’s expected that “email_info” is a data structure that consists of fields
|
||||
“from_email”, “to_email”, “smtp_server” and “smtp_password”.
|
||||
|
||||
Uploading workflow to Mistral
|
||||
=============================
|
||||
|
||||
Assuming we have installed Mistral client we can upload this workflow to
|
||||
Mistral with the command:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ mistral workflow-create update_kernel.yaml
|
||||
|
||||
Normal output of this command (and most others) shows a table with a newly
|
||||
uploaded workflow. It may look like:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
+----------------+--------+------------------------------+----------------------------+------------+
|
||||
| Name | Tags | Input | Created at | Updated at |
|
||||
+----------------+--------+------------------------------+----------------------------+------------+
|
||||
| upgrade_kernel | <none> | username=ubuntu, private_... | 2015-10-19 10:32:27 | None |
|
||||
+----------------+--------+------------------------------+----------------------------+------------+
|
||||
|
||||
NOTE: In order to print all available workflows run:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ mistral workflow-list
|
||||
|
||||
Running the workflow
|
||||
====================
|
||||
|
||||
Now once Mistral knows about workflow “upgrade_kernel” we can start it by
|
||||
running:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ mistral execution-create upgrade_kernel input.json
|
||||
|
||||
File input.json should contain a workflow input data in JSON such as:
|
||||
|
||||
.. code-block:: rest
|
||||
|
||||
{
|
||||
“private_key_filename”: “my_key.pem”,
|
||||
“gateway_host”: “172.16.74.8”
|
||||
}
|
||||
|
||||
Configuring a Cron Trigger
|
||||
==========================
|
||||
|
||||
In order to make this workflow run periodically we need to create a cron
|
||||
trigger:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ mistral cron-trigger-create update_kernel_weekly update_kernel --pattern “0 2 * * mon”
|
||||
|
||||
In order to print all active cron triggers run:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ mistral cron-trigger-list
|
||||
|
||||
From now on the workflow we created will be started every Monday at 2.00 am
|
||||
and it will be updating Linux kernel on all servers in a tenant we logged in.
|
||||
|
||||
What’s important about Mistral Cron Triggers is that it is also a distributed
|
||||
fault-tolerant mechanism. That means that if a number of Mistral engines crash
|
||||
then cron triggers will keep working because there’s no single point of failure
|
||||
for them.
|
||||
|
||||
If we no longer need to upgrade kernel periodically we can just delete the
|
||||
trigger:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ mistral cron-trigger-delete update_kernel_weekly
|
Binary file not shown.
After Width: | Height: | Size: 31 KiB |
BIN
doc/source/user/cookbooks/img/ssh_proxied.png
Normal file
BIN
doc/source/user/cookbooks/img/ssh_proxied.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 39 KiB |
8
doc/source/user/cookbooks/index.rst
Normal file
8
doc/source/user/cookbooks/index.rst
Normal file
@ -0,0 +1,8 @@
|
||||
=================
|
||||
Mistral Cookbooks
|
||||
=================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
cloud_cron
|
@ -23,4 +23,4 @@ info on concrete features.
|
||||
wf_lang_v2
|
||||
rest_api_v2
|
||||
cli/index
|
||||
cookbooks
|
||||
cookbooks/index
|
||||
|
Loading…
x
Reference in New Issue
Block a user