Doc Improvment:Add doc about cyborg-nova interaction
Change-Id: I850a7e29880a9744b18523281756bab2de1d5492 Closes-Bug: #1930404
This commit is contained in:
parent
1052efe93b
commit
ade1972fc3
237
doc/source/admin/index.rst
Normal file
237
doc/source/admin/index.rst
Normal file
@ -0,0 +1,237 @@
|
||||
====================
|
||||
Acceleration Service
|
||||
====================
|
||||
|
||||
The OpenStack Cyborg is running as an acceleration service that allows you to
|
||||
manage the lifecycle of accelerating for an instance in cloud computing
|
||||
platform. It gives you control over accelerators attached to instances easily.
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
A good understand on how Cyborg interacts with Nova and Placement help
|
||||
operators manage the acceleration service more effectively.
|
||||
|
||||
.. image:: ../figures/cyborg-nova-interaction.png
|
||||
:width: 700 px
|
||||
:scale: 99 %
|
||||
:align: center
|
||||
|
||||
Coexistence with PCI whitelists
|
||||
-------------------------------
|
||||
|
||||
The operator tells Nova which PCI devices to claim and to be used by
|
||||
configuring the PCI Whitelists mechanism. In addition, the operator installs
|
||||
Cyborg drivers in compute nodes and configures/enables them. Those drivers may
|
||||
then discover and report some PCI devices. The operator must ensure that both
|
||||
configurations are compatible.
|
||||
|
||||
Ideally, there is a single way for the operator to identify which PCI
|
||||
devices should be claimed by Nova and which by Cyborg. Until that is figured
|
||||
out, the operator shall use Cyborg’s configuration file to specify which Cyborg
|
||||
drivers are enabled. Since each driver claims specific PCI IDs, the operator
|
||||
can and must ensure that none of these PCI IDs are included in Nova’s PCI
|
||||
whitelist.
|
||||
|
||||
Placement update
|
||||
----------------
|
||||
|
||||
Cyborg conductor calls Placement API directly to represent devices and
|
||||
accelerators. Some of the intended use cases for the API invocation are:
|
||||
|
||||
* Create or delete child RPs under the compute node RP.
|
||||
|
||||
* Create or delete custom RCs and custom traits.
|
||||
|
||||
* Associate traits with RPs or remove such association.
|
||||
|
||||
* Update RP inventory.
|
||||
|
||||
Cyborg shall not modify the RPs created by any other component, such as Nova
|
||||
virt drivers.
|
||||
|
||||
User Requests
|
||||
-------------
|
||||
|
||||
The user request for accelerators is encapsulated in a device profile,
|
||||
which is created and managed by the admin via the Cyborg API.
|
||||
|
||||
The structure overview of a `device_profile` is like this:
|
||||
|
||||
.. code:: json
|
||||
|
||||
{
|
||||
"device_profiles":[
|
||||
{
|
||||
"name":"fpga-dp1",
|
||||
"uuid":"5518a925-1c2c-49a2-a8bf-0927d9456f3e",
|
||||
"description": "",
|
||||
"groups":[
|
||||
{
|
||||
"trait:CUSTOM_FPGA_TRAITS":"required",
|
||||
"resources:FPGA":"1",
|
||||
"accel:bitstream_id":"d5ca2f11-3108-4426-a11c-a959987565df"
|
||||
}
|
||||
],
|
||||
"created_at": "2020-03-10 03:52:15+00:00",
|
||||
"updated_at": null,
|
||||
"links":[
|
||||
{
|
||||
"href":"http://192.168.32.217/accelerator/v2/device_profiles/5518a925-1c2c-49a2-a8bf-0927d9456f3e",
|
||||
"rel":"self"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
The device profile is folded into the flavor as an extra spec by the operator,
|
||||
as below:
|
||||
|
||||
.. code:: bash
|
||||
|
||||
openstack flavor set --property 'accel:device_profile=<profile_name>' flavor
|
||||
|
||||
Thus the standard Nova API can be used to create an instance with only the
|
||||
flavor (without device profiles), like this:
|
||||
|
||||
.. code:: bash
|
||||
|
||||
openstack server create --flavor f .... # instance creation
|
||||
|
||||
In the future, device profile may be used by itself to specify accelerator
|
||||
resources for the instance creation API.
|
||||
|
||||
Updating the Request Spec
|
||||
-------------------------
|
||||
|
||||
When the user submits a request to create an instance, as described in Section
|
||||
User Requests, Nova needs to call a Cyborg API, to get back the resource
|
||||
request groups in the device profile and merge them into the request spec.
|
||||
|
||||
This call, like all the others that Nova would make to Cyborg APIs, is done
|
||||
through a Keystone-based adapter that would locate the Cyborg service, similar
|
||||
to the way Nova calls Placement. A Cyborg client module added to Nova, will
|
||||
encapsulate such calls.
|
||||
|
||||
VM images in Glance may be associated with image properties (other than image
|
||||
traits), such as bitstream/function IDs needed for that image. So, Nova should
|
||||
pass the VM image UUID from the request spec to Cyborg.
|
||||
|
||||
The groups in the device profile are numbered by Cyborg. The request groups
|
||||
that are merged into the request spec are numbered by Nova. These numberings
|
||||
would not be the same in general, i.e., the N-th device profile group may not
|
||||
correspond to the N-th request group in the request spec.
|
||||
|
||||
When the device profile request groups are added to other request groups in the
|
||||
flavor, the group_policy of the flavor shall govern the overall semantics of
|
||||
all request groups.
|
||||
|
||||
Accelerator Requests
|
||||
--------------------
|
||||
|
||||
An accelerator request (ARQ) is an object that represents the state of the
|
||||
request for an accelerator to be assigned to an instance. The creation and
|
||||
management of ARQs are handled by Cyborg, and ARQs are persisted in Cyborg
|
||||
database.
|
||||
|
||||
An ARQ represents a request for a single accelerator by definition. The
|
||||
device profile in the user request may have N request groups, each asking for M
|
||||
accelerators, then N * M ARQs will be created for that device profile.
|
||||
|
||||
When an ARQ is initially created by Cyborg, it is not yet associated with a
|
||||
specific host name or a device resource provider. So it is said to be in an
|
||||
unbound state. Subsequently, Nova calls Cyborg to bind the ARQ to a host name,
|
||||
a device RP UUID and an instance UUID. If the instance fails to spawn, Nova
|
||||
would unbind the ARQ with deleting it. On instance termination, Nova would
|
||||
delete the ARQs after unbinding them.
|
||||
|
||||
Each ARQ needs to be matched to the specific RP in the allocation candidate
|
||||
that Nova has chosen, before the ARQ is bound. The current Nova code maps
|
||||
request groups to RPs, while the Cyborg client module in Nova
|
||||
(cyborg-client-module) matches ARQs to request groups. The matching is done
|
||||
using the request_id field in the RequestGroup object as below:
|
||||
|
||||
* The order of request groups in a device profile is not significant, but it is
|
||||
preserved by Cyborg. Thus, each device profile request group has a unique
|
||||
index.
|
||||
|
||||
* When the device profile request groups returned by Cyborg are added to the
|
||||
request spec, the request_id field is set to ‘device_profile_<N>’ for the
|
||||
N-th device profile request group (starting from zero). The device profile
|
||||
name need not be included here because there is only one device profile per
|
||||
request spec.
|
||||
|
||||
* When Cyborg creates an ARQ for a device profile, it embeds the device profile
|
||||
request group index in the ARQ before returning it to Nova.
|
||||
|
||||
* The matching is done in two steps:
|
||||
|
||||
* Each ARQ is mapped to a specific request group in the request spec using
|
||||
the request_id field.
|
||||
|
||||
* Each request group is mapped to a specific RP using the same logic as the
|
||||
Neutron bandwidth provider.
|
||||
|
||||
Cyborg and Nova interaction workflow
|
||||
------------------------------------
|
||||
|
||||
This flow is captured by the following sequence diagram, in which the Nova
|
||||
conductor and scheduler are together represented as the Nova controller.
|
||||
|
||||
.. image:: ../figures/cyborg-nova-interaction-workflow.svg
|
||||
|
||||
A Cyborg client module is added to nova (cyborg-client-module). All Cyborg API
|
||||
calls are routed through that.
|
||||
|
||||
1. The Nova API server receives a `POST /servers` API request with a flavor
|
||||
that includes a device profile name.
|
||||
|
||||
2. The Nova API server calls the Cyborg API
|
||||
`GET /v2/device_profiles?name=$device_profile_name` and gets back the device
|
||||
profile. The request groups in that device profile are added to the request
|
||||
spec.
|
||||
|
||||
3. The Nova scheduler invokes Placement and gets a list of allocation
|
||||
candidates. It selects one of those candidates and makes claim(s) in
|
||||
Placement. The Nova conductor then sends a RPC message
|
||||
build_and_run_instances to the Nova compute manager.
|
||||
|
||||
4. Nova conductor manager calls the Cyborg API `POST /v2/accelerator_requests`
|
||||
with the device profile name. Cyborg creates a set of unbound ARQs for that
|
||||
device profile and returns them to Nova.
|
||||
|
||||
5. The Cyborg client in Nova matches each ARQ to the resource provider picked
|
||||
for that accelerator.
|
||||
|
||||
6. The Nova compute manager calls the Cyborg API
|
||||
`PATCH /v2/accelerator_requests` to bind the ARQ with the host name,
|
||||
device’s RP UUID and instance UUID. This is an asynchronous call which
|
||||
prepares or reconfigures the device in the background.
|
||||
|
||||
7. Cyborg, on completion of the bindings (successfully or otherwise),
|
||||
calls Nova’s `POST /os-server-external-events` API with:
|
||||
|
||||
.. code::
|
||||
|
||||
{
|
||||
"events": [
|
||||
{ "name": "accelerator-request-bound",
|
||||
"tag": $device_profile_name,
|
||||
"server_uuid": $instance_uuid,
|
||||
"status": "completed" # or "failed"
|
||||
},
|
||||
...
|
||||
]
|
||||
}
|
||||
|
||||
8. The Nova compute manager waits for the notification, subject to the timeout
|
||||
mentioned in Section Other deployer impact. It then calls the Cyborg REST
|
||||
API `GET /v2/accelerator_requests?instance=<uuid>&bind_state=resolved`.
|
||||
|
||||
9. The Nova virt driver uses the attach handles returned from the Cyborg call
|
||||
to compose PCI passthrough devices into the VM’s definition.
|
||||
|
||||
10. If there is any error after binding has been initiated, Nova must unbind
|
||||
the relevant ARQs by calling Cyborg API. It may then retry on another host
|
||||
or delete the (unbound) ARQs for the instance.
|
83
doc/source/figures/cyborg-nova-interaction-workflow.svg
Normal file
83
doc/source/figures/cyborg-nova-interaction-workflow.svg
Normal file
@ -0,0 +1,83 @@
|
||||
<svg height="885" viewBox="0 0 856 885" width="856" xmlns="http://www.w3.org/2000/svg" xmlns:inkspace="http://www.inkscape.org/namespaces/inkscape" xmlns:xlink="http://www.w3.org/1999/xlink">
|
||||
<defs id="defs_block">
|
||||
<filter height="1.504" id="filter_blur" inkspace:collect="always" width="1.1575" x="-0.07875" y="-0.252">
|
||||
<feGaussianBlur id="feGaussianBlur3780" inkspace:collect="always" stdDeviation="4.2"></feGaussianBlur>
|
||||
</filter>
|
||||
</defs>
|
||||
<title>blockdiag</title>
|
||||
<desc></desc>
|
||||
<rect fill="rgb(0,0,0)" height="40" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="67" y="21"></rect>
|
||||
<rect fill="rgb(0,0,0)" height="40" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="267" y="21"></rect>
|
||||
<rect fill="rgb(0,0,0)" height="40" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="467" y="21"></rect>
|
||||
<rect fill="rgb(0,0,0)" height="40" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="667" y="21"></rect>
|
||||
<path d="M 128 55 L 128 885" fill="none" stroke="rgb(0,0,0)" stroke-dasharray="8 4"></path>
|
||||
<path d="M 328 55 L 328 885" fill="none" stroke="rgb(0,0,0)" stroke-dasharray="8 4"></path>
|
||||
<path d="M 528 55 L 528 885" fill="none" stroke="rgb(0,0,0)" stroke-dasharray="8 4"></path>
|
||||
<path d="M 728 55 L 728 885" fill="none" stroke="rgb(0,0,0)" stroke-dasharray="8 4"></path>
|
||||
<rect fill="rgb(255,255,255)" height="40" stroke="rgb(0,0,0)" width="128" x="64" y="15"></rect>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="90" x="128.0" y="41">Nova Controller</text>
|
||||
<rect fill="rgb(255,255,255)" height="40" stroke="rgb(0,0,0)" width="128" x="264" y="15"></rect>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="54" x="328.0" y="41">Placement</text>
|
||||
<rect fill="rgb(255,255,255)" height="40" stroke="rgb(0,0,0)" width="128" x="464" y="15"></rect>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="36" x="528.0" y="41">Cyborg</text>
|
||||
<rect fill="rgb(255,255,255)" height="40" stroke="rgb(0,0,0)" width="128" x="664" y="15"></rect>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="72" x="728.0" y="41">Nova Compute</text>
|
||||
<path d="M 136 85 L 520 85" fill="none" stroke="rgb(0,0,0)"></path>
|
||||
<polygon fill="rgb(0,0,0)" points="512,81 520,85 512,89" stroke="rgb(0,0,0)"></polygon>
|
||||
<path d="M 136 125 L 520 125" fill="none" stroke="rgb(0,0,0)"></path>
|
||||
<polygon fill="rgb(0,0,0)" points="144,121 136,125 144,129" stroke="rgb(0,0,0)"></polygon>
|
||||
<path d="M 136 191 L 224 191" fill="none" stroke="rgb(0,0,0)"></path>
|
||||
<path d="M 224 191 L 224 207" fill="none" stroke="rgb(0,0,0)"></path>
|
||||
<path d="M 224 207 L 136 207" fill="none" stroke="rgb(0,0,0)"></path>
|
||||
<polygon fill="rgb(0,0,0)" points="144,203 136,207 144,211" stroke="rgb(0,0,0)"></polygon>
|
||||
<path d="M 136 247 L 320 247" fill="none" stroke="rgb(0,0,0)"></path>
|
||||
<polygon fill="rgb(0,0,0)" points="312,243 320,247 312,251" stroke="rgb(0,0,0)"></polygon>
|
||||
<path d="M 136 300 L 320 300" fill="none" stroke="rgb(0,0,0)"></path>
|
||||
<polygon fill="rgb(0,0,0)" points="144,296 136,300 144,304" stroke="rgb(0,0,0)"></polygon>
|
||||
<path d="M 136 353 L 224 353" fill="none" stroke="rgb(0,0,0)"></path>
|
||||
<path d="M 224 353 L 224 369" fill="none" stroke="rgb(0,0,0)"></path>
|
||||
<path d="M 224 369 L 136 369" fill="none" stroke="rgb(0,0,0)"></path>
|
||||
<polygon fill="rgb(0,0,0)" points="144,365 136,369 144,373" stroke="rgb(0,0,0)"></polygon>
|
||||
<path d="M 136 409 L 720 409" fill="none" stroke="rgb(0,0,0)"></path>
|
||||
<polygon fill="rgb(0,0,0)" points="712,405 720,409 712,413" stroke="rgb(0,0,0)"></polygon>
|
||||
<path d="M 536 462 L 720 462" fill="none" stroke="rgb(0,0,0)"></path>
|
||||
<polygon fill="rgb(0,0,0)" points="544,458 536,462 544,466" stroke="rgb(0,0,0)"></polygon>
|
||||
<path d="M 536 502 L 720 502" fill="none" stroke="rgb(0,0,0)"></path>
|
||||
<polygon fill="rgb(0,0,0)" points="712,498 720,502 712,506" stroke="rgb(0,0,0)"></polygon>
|
||||
<path d="M 536 555 L 720 555" fill="none" stroke="rgb(0,0,0)"></path>
|
||||
<polygon fill="rgb(0,0,0)" points="544,551 536,555 544,559" stroke="rgb(0,0,0)"></polygon>
|
||||
<path d="M 536 595 L 720 595" fill="none" stroke="rgb(0,0,0)"></path>
|
||||
<polygon fill="rgb(0,0,0)" points="712,591 720,595 712,599" stroke="rgb(0,0,0)"></polygon>
|
||||
<path d="M 136 635 L 520 635" fill="none" stroke="rgb(0,0,0)"></path>
|
||||
<polygon fill="rgb(0,0,0)" points="144,631 136,635 144,639" stroke="rgb(0,0,0)"></polygon>
|
||||
<path d="M 736 701 L 816 701" fill="none" stroke="rgb(0,0,0)"></path>
|
||||
<path d="M 816 701 L 816 717" fill="none" stroke="rgb(0,0,0)"></path>
|
||||
<path d="M 816 717 L 736 717" fill="none" stroke="rgb(0,0,0)"></path>
|
||||
<polygon fill="rgb(0,0,0)" points="744,713 736,717 744,721" stroke="rgb(0,0,0)"></polygon>
|
||||
<path d="M 536 796 L 720 796" fill="none" stroke="rgb(0,0,0)"></path>
|
||||
<polygon fill="rgb(0,0,0)" points="544,792 536,796 544,800" stroke="rgb(0,0,0)"></polygon>
|
||||
<path d="M 536 836 L 720 836" fill="none" stroke="rgb(0,0,0)"></path>
|
||||
<polygon fill="rgb(0,0,0)" points="712,832 720,836 712,840" stroke="rgb(0,0,0)"></polygon>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="198" x="239.0" y="83">GET /v2/device_profiles?name=mydp</text>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="216" x="412.0" y="123">{"device_profiles": $device_profile}</text>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="108" x="190.0" y="163">Merge request grou</text>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="108" x="190.0" y="176">ps into request_sp</text>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="12" x="142.0" y="189">ec</text>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="156" x="218.0" y="245">Get /allocation_candidates</text>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="180" x="230.0" y="285">allocation candidates with nes</text>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="42" x="299.0" y="298">ted RPs</text>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="108" x="190.0" y="345">Select a candidate</text>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="150" x="215.0" y="407">build_and_run_instances()</text>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="174" x="633.0" y="454">POST /v2/accelerator_requests</text>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="120" x="600.0" y="500">{"arqs": [$arq, ...]</text>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="180" x="630.0" y="547">PATCH /v2/accelerator_requests</text>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="120" x="600.0" y="593">{"arqs": [$arq, ...]</text>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="186" x="427.0" y="633">POST /os-server-external-events</text>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="96" x="784.0" y="673">Wait for notific</text>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="96" x="784.0" y="686">ation from Cybor</text>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="6" x="739.0" y="699">g</text>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="174" x="633.0" y="762">GET /v2/accelerator_requests?</text>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="180" x="630.0" y="775">instance=$uuid&bind_state=reso</text>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="24" x="708.0" y="788">lved</text>
|
||||
<text fill="rgb(0,0,0)" font-family="sans-serif" font-size="11" font-style="normal" font-weight="normal" text-anchor="middle" textLength="132" x="606.0" y="834">{"arqs": [$arq, ....]}</text>
|
||||
</svg>
|
After Width: | Height: | Size: 9.5 KiB |
BIN
doc/source/figures/cyborg-nova-interaction.png
Normal file
BIN
doc/source/figures/cyborg-nova-interaction.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 37 KiB |
@ -29,6 +29,7 @@ Installation
|
||||
install/install-from-pip
|
||||
install/install-from-source
|
||||
admin/config-wsgi
|
||||
admin/index
|
||||
|
||||
Configuration Reference
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
@ -39,6 +40,14 @@ Configuration Reference
|
||||
configuration/index
|
||||
reference/support-matrix
|
||||
|
||||
Maintenance
|
||||
~~~~~~~~~~~
|
||||
|
||||
Once you are running cyborg, the following information is extremely useful.
|
||||
|
||||
* :doc:`Admin Guide </admin/index>`: A collection of guides for administrating
|
||||
cyborg.
|
||||
|
||||
For End Users
|
||||
-------------
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user