Initial cyborg api and db design proposal

This spec proposes the initial design for the cyborg api. The cyborg api should support the basic operations concerning accelerators, and does not necessarily have to be user facing api at the early stage.The api should support functionalities such as provision, attach, detach, list and update. This spec also contains the proposal for a simple DB for Cyborg. Note that although we propose a DB schema for Cyborg, in implementation it should be aligned with resource provider db schema as much as possible. APIImpact Change-Id: I98c74df91f4548ecef42d2e3f96facf9023a346a Signed-off-by: zhipengh <huangzhipeng@huawei.com>
2017-03-15 16:24:39 +08:00 · 2017-03-15 16:24:39 +08:00 · b8669f18e6
commit b8669f18e6
parent 2b01cb135a
3 changed files with 802 additions and 6 deletions
--- a/specs/pike/README.rst
+++ b/specs/pike/README.rst
@ -1,6 +0,0 @@
-============
-Cyborg Specs
-============
-
-This folder contains all the spec files.
-
--- a/specs/pike/proposal/cyborg-api-proposal.rst
+++ b/specs/pike/proposal/cyborg-api-proposal.rst
@ -0,0 +1,410 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+===================
+Cyborg API proposal
+===================
+
+https://blueprints.launchpad.net/openstack-cyborg/+spec/cyborg-api
+
+This spec proposes to provide the initial API design for Cyborg.
+
+Problem description
+===================
+
+Cyborg as a common management framework for dedicated devices (hardware/
+software accelerators, high-speed storage, etc) needs RESTful API to expose
+the basic functionalities.
+
+Use Cases
+---------
+
+* As a user I want to be able to spawn VM with dedicated hardware, so
+that I can utilize provided hardware.
+* As a compute service I need to know how requested resource should be
+attached to the VM.
+* As a scheduler service I'd like to know on which resource provider
+requested resource can be found.
+
+Proposed change
+===============
+
+In general we want to develop the APIs that support basic life cycle management
+for Cyborg.
+
+Life Cycle Management Phases
+----------------------------
+
+For cyborg, LCM phases include typical create, retrieve, update, delete operations.
+One thing should be noted that deprovisioning mainly refers to detach(delete) operation
+which deactivate an acceleration capability but preserve the resource itself
+for future usage. For Cyborg, from functional point of view, the LCM includes provision,
+attach,update,list, and detach. There is no notion of deprovisioning for Cyborg API
+in a sense that we decomission or disconnect an entire accelerator device from
+the bus.
+
+Difference between Provision and Attach/Detach
+----------------------------------------------
+
+Noted that while the APIs support provisioning via CRUD operations, attach/detach
+are considered different:
+
+* Provision operations (create) will involve api->
+conductor->agent->driver workflow, where as attach/detach (update/delete) could be taken
+care of at the driver layer without the involvement of the pre-mentioned workflow. This
+is similar to the difference between create a volume and attach/detach a volume in Cinder.
+
+* The attach/detach in Cyborg API will mainly involved in DB status modification.
+
+Difference between Attach/Detach To VM and Host
+-----------------------------------------------
+
+Moreover there are also differences when we attach an accelerator to a VM or
+a host, similar to Cinder.
+
+* When the attachment happens to a VM, we are expecting that Nova could call
+the virt driver to perform the action for the instance. In this case Nova
+needs to support the acc-attach and acc-detach action.
+
+* When the attachment happens to a host, we are expecting that Cyborg could
+take care of the action itself via Cyborg driver. Althrough currently there
+is the generic driver to accomplish the job, we should consider a os-brick
+like standalone lib for accelerator attach/detach operations.
+
+Alternatives
+------------
+
+* For attaching an accelerator to a VM, we could let Cyborg perform the action
+itself, however it runs into the risk of tight-coupling with Nova of which Cyborg
+needs to get instance related information.
+* For attaching an accelerator to a host, we could consider to use Ironic drivers
+however it might not bode well with the standalone accelerator rack scenarios where
+accelerators are not attached to server at all.
+
+Data model impact
+-----------------
+
+A new table in the API database will be created::
+
+    CREATE TABLE accelerators (
+        accelerator_id INT NOT NULL,
+        device_type STRING NOT NULL,
+        acc_type STRING NOT NULL,
+        acc_capability STRING NOT NULL,
+        vendor_id STRING,
+        product_id STRING,
+        remotable INT,
+    );
+
+Note that there is an ongoing discussion on nested resource
+provider new data structures that will impact Cyborg DB imp-
+lementation. For code implementation it should be aligned
+with resource provider db requirement as much as possible.
+
+
+REST API impact
+---------------
+
+The API changes add resource endpoints to:
+
+* `GET` a list of all the accelerators
+* `GET` a single accelerator for a given id
+* `POST` create a new accelerator resource
+* `PUT` an update to an existing accelerator spec
+* `PUT` attach an accelerator to a VM or a host
+* `DELETE` detach an existing accelerator for a given id
+
+The following new REST API call will be created:
+
+'GET /accelerators'
+*************************
+
+Return a list of accelerators managed by Cyborg
+
+Example message body of the response to the GET operation::
+
+    200 OK
+    Content-Type: application/json
+
+    {
+       "accelerator":[
+        {
+          "uuid":"8e45a2ea-5364-4b0d-a252-bf8becaa606e",
+          "acc_specs":
+          {
+             "remote":0,
+             "num":1,
+             "device_type":"CRYPTO"
+             "acc_capability":
+             {
+                "num":2
+                "ipsec":
+                {
+                   "aes":
+                   {
+                      "3des":50,
+                      "num":1,
+                   }
+                }
+             }
+           }
+         },
+         {
+           "uuid":"eaaf1c04-ced2-40e4-89a2-87edded06d64",
+           "acc_specs":
+           {
+              "remote":0,
+              "num":1,
+              "device_type":"CRYPTO"
+              "acc_capability":
+              {
+                 "num":2
+                 "ipsec":
+                 {
+                    "aes":
+                    {
+                       "3des":40,
+                       "num":1,
+                    }
+                 }
+              }
+            }
+          }
+       ]
+    }
+
+'GET /accelerators/{uuid}'
+*************************
+
+Retrieve a certain accelerator info indetified by '{uuid}'
+
+Example GET Request::
+
+    GET /accelerators/8e45a2ea-5364-4b0d-a252-bf8becaa606e
+
+    200 OK
+    Content-Type: application/json
+
+    {
+       "uuid":"8e45a2ea-5364-4b0d-a252-bf8becaa606e",
+       "acc_specs":{
+          "remote":0,
+          "num":1,
+          "device_type":"CRYPTO"
+          "acc_capability":{
+             "num":2
+             "ipsec":{
+                 "aes":{
+                   "3des":50,
+                   "num":1,
+                 }
+             }
+          }
+        }
+    }
+
+If the accelerator does not exist a `404 Not Found` must be
+returned.
+
+'POST /accelerators/{uuid}'
+*******************
+
+Create a new accelerator
+
+Example POST Request::
+
+    Content-type: application/json
+
+    {
+        "name": "IPSec Card",
+        "uuid": "8e45a2ea-5364-4b0d-a252-bf8becaa606e"
+    }
+
+The body of the request must match the following JSONSchema document::
+
+    {
+        "type": "object",
+        "properties": {
+            "name": {
+                "type": "string"
+            },
+            "uuid": {
+                "type": "string",
+                "format": "uuid"
+            }
+        },
+        "required": [
+            "name"
+        ]
+        "additionalProperties": False
+    }
+
+The response body is empty. The headers include a location header
+pointing to the created accelerator resource::
+
+    201 Created
+    Location: /accelerators/8e45a2ea-5364-4b0d-a252-bf8becaa606e
+
+A `409 Conflict` response code will be returned if another accelerator
+exists with the provided name.
+
+'PUT /accelerators/{uuid}/{acc_spec}'
+*************************
+
+Update the spec for the accelerator identified by `{uuid}`.
+
+Example::
+
+    PUT /accelerator/8e45a2ea-5364-4b0d-a252-bf8becaa606e
+
+    Content-type: application/json
+
+    {
+        "acc_specs":{
+           "remote":0,
+           "num":1,
+           "device_type":"CRYPTO"
+           "acc_capability":{
+              "num":2
+              "ipsec":{
+                 "aes":{
+                   "3des":50,
+                   "num":1,
+                 }
+              }
+           }
+         }
+    }
+
+The returned HTTP response code will be one of the following:
+
+* `200 OK` if the spec is successfully updated
+* `404 Not Found` if the accelerator identified by `{uuid}` was
+  not found
+* `400 Bad Request` for bad or invalid syntax
+* `409 Conflict` if another process updated the same spec.
+
+
+'PUT /accelerators/{uuid}'
+*************************
+
+Attach the accelerator identified by `{uuid}`.
+
+Example::
+
+    PUT /accelerator/8e45a2ea-5364-4b0d-a252-bf8becaa606e
+
+    Content-type: application/json
+
+    {
+        "name": "IPSec Card",
+        "uuid": "8e45a2ea-5364-4b0d-a252-bf8becaa606e"
+    }
+
+The returned HTTP response code will be one of the following:
+
+* `200 OK` if the accelerator is successfully attached
+* `404 Not Found` if the accelerator identified by `{uuid}` was
+  not found
+* `400 Bad Request` for bad or invalid syntax
+* `409 Conflict` if another process attach the same accelerator.
+
+
+'DELETE /accelerator/{uuid}'
+****************************
+
+Detach the accelerator identified by `{uuid}`.
+
+The body of the request and the response is empty.
+
+The returned HTTP response code will be one of the following:
+
+* `204 No Content` if the request was successful and the accelerator was detached.
+* `404 Not Found` if the accelerator identified by `{uuid}` was
+  not found.
+* `409 Conflict` if there exist allocations records for any of the
+  accelerator resource that would be detached as a result of detaching the accelerator.
+
+
+Security impact
+---------------
+
+None
+
+Notifications impact
+--------------------
+
+None
+
+Other end user impact
+---------------------
+
+None
+
+Performance Impact
+------------------
+
+None
+
+Other deployer impact
+---------------------
+
+None
+
+Developer impact
+----------------
+
+Developers can use this REST API after it has been implemented.
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Primary assignee:
+  zhipengh <huangzhipeng@huawei.com>
+
+Work Items
+----------
+
+* Implement the APIs specified in this spec
+* Proposal to Nova about the new accelerator
+attach/detach api
+* Implement the DB specified in this spec
+
+
+Dependencies
+============
+
+None.
+
+Testing
+=======
+
+* Unit tests will be added to Cyborg API.
+
+Documentation Impact
+====================
+
+None
+
+References
+==========
+
+None
+
+History
+=======
+
+
+.. list-table:: Revisions
+   :header-rows: 1
+
+   * - Release
+     - Description
+   * - Pike
+     - Introduced
--- a/specs/template.rst
+++ b/specs/template.rst
@ -0,0 +1,392 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+==========================================
+Example Spec - The title of your blueprint
+==========================================
+
+Include the URL of your launchpad blueprint:
+
+https://blueprints.launchpad.net/openstack-cyborg/+spec/example
+
+Introduction paragraph -- why are we doing anything? A single paragraph of
+prose that operators can understand. The title and this first paragraph
+should be used as the subject line and body of the commit message
+respectively.
+
+Some notes about the cyborg-spec and blueprint process:
+
+* Not all blueprints need a spec. For more information see
+  http://docs.openstack.org/developer/cyborg/blueprints.html#specs
+
+* The aim of this document is first to define the problem we need to solve,
+  and second agree the overall approach to solve that problem.
+
+* This is not intended to be extensive documentation for a new feature.
+  For example, there is no need to specify the exact configuration changes,
+  nor the exact details of any DB model changes. But you should still define
+  that such changes are required, and be clear on how that will affect
+  upgrades.
+
+* You should aim to get your spec approved before writing your code.
+  While you are free to write prototypes and code before getting your spec
+  approved, its possible that the outcome of the spec review process leads
+  you towards a fundamentally different solution than you first envisaged.
+
+* But, API changes are held to a much higher level of scrutiny.
+  As soon as an API change merges, we must assume it could be in production
+  somewhere, and as such, we then need to support that API change forever.
+  To avoid getting that wrong, we do want lots of details about API changes
+  upfront.
+
+Some notes about using this template:
+
+* Your spec should be in ReSTructured text, like this template.
+
+* Please wrap text at 79 columns.
+
+* The filename in the git repository should match the launchpad URL, for
+  example a URL of: https://blueprints.launchpad.net/openstack-cyborg/+spec/awesome-thing
+  should be named awesome-thing.rst
+
+* Please do not delete any of the sections in this template.  If you have
+  nothing to say for a whole section, just write: None
+
+* For help with syntax, see http://sphinx-doc.org/rest.html
+
+* To test out your formatting, build the docs using tox and see the generated
+  HTML file in doc/build/html/specs/<path_of_your_file>
+
+* If you would like to provide a diagram with your spec, ascii diagrams are
+  required.  http://asciiflow.com/ is a very nice tool to assist with making
+  ascii diagrams.  The reason for this is that the tool used to review specs is
+  based purely on plain text.  Plain text will allow review to proceed without
+  having to look at additional files which can not be viewed in gerrit.  It
+  will also allow inline feedback on the diagram itself.
+
+* If your specification proposes any changes to the Cyborg REST API such
+  as changing parameters which can be returned or accepted, or even
+  the semantics of what happens when a client calls into the API, then
+  you should add the APIImpact flag to the commit message. Specifications with
+  the APIImpact flag can be found with the following query:
+
+  https://review.openstack.org/#/q/status:open+project:openstack/cyborg+message:apiimpact,n,z
+
+
+Problem description
+===================
+
+A detailed description of the problem. What problem is this blueprint
+addressing?
+
+Use Cases
+---------
+
+What use cases does this address? What impact on actors does this change have?
+Ensure you are clear about the actors in each use case: Developer, End User,
+Deployer etc.
+
+Proposed change
+===============
+
+Here is where you cover the change you propose to make in detail. How do you
+propose to solve this problem?
+
+If this is one part of a larger effort make it clear where this piece ends. In
+other words, what's the scope of this effort?
+
+At this point, if you would like to just get feedback on if the problem and
+proposed change fit in Cyborg, you can stop here and post this for review to get
+preliminary feedback. If so please say:
+Posting to get preliminary feedback on the scope of this spec.
+
+Alternatives
+------------
+
+What other ways could we do this thing? Why aren't we using those? This doesn't
+have to be a full literature review, but it should demonstrate that thought has
+been put into why the proposed solution is an appropriate one.
+
+Data model impact
+-----------------
+
+Changes which require modifications to the data model often have a wider impact
+on the system.  The community often has strong opinions on how the data model
+should be evolved, from both a functional and performance perspective. It is
+therefore important to capture and gain agreement as early as possible on any
+proposed changes to the data model.
+
+Questions which need to be addressed by this section include:
+
+* What new data objects and/or database schema changes is this going to
+  require?
+
+* What database migrations will accompany this change.
+
+* How will the initial set of new data objects be generated, for example if you
+  need to take into account existing instances, or modify other existing data
+  describe how that will work.
+
+REST API impact
+---------------
+
+Each API method which is either added or changed should have the following
+
+* Specification for the method
+
+  * A description of what the method does suitable for use in
+    user documentation
+
+  * Method type (POST/PUT/GET/DELETE)
+
+  * Normal http response code(s)
+
+  * Expected error http response code(s)
+
+    * A description for each possible error code should be included
+      describing semantic errors which can cause it such as
+      inconsistent parameters supplied to the method, or when an
+      instance is not in an appropriate state for the request to
+      succeed. Errors caused by syntactic problems covered by the JSON
+      schema definition do not need to be included.
+
+  * URL for the resource
+
+    * URL should not include underscores, and use hyphens instead.
+
+  * Parameters which can be passed via the url
+
+  * JSON schema definition for the request body data if allowed
+
+    * Field names should use snake_case style, not CamelCase or MixedCase
+      style.
+
+  * JSON schema definition for the response body data if any
+
+    * Field names should use snake_case style, not CamelCase or MixedCase
+      style.
+
+* Example use case including typical API samples for both data supplied
+  by the caller and the response
+
+* Discuss any policy changes, and discuss what things a deployer needs to
+  think about when defining their policy.
+
+Note that the schema should be defined as restrictively as
+possible. Parameters which are required should be marked as such and
+only under exceptional circumstances should additional parameters
+which are not defined in the schema be permitted (eg
+additionaProperties should be False).
+
+Reuse of existing predefined parameter types such as regexps for
+passwords and user defined names is highly encouraged.
+
+Security impact
+---------------
+
+Describe any potential security impact on the system.  Some of the items to
+consider include:
+
+* Does this change touch sensitive data such as tokens, keys, or user data?
+
+* Does this change alter the API in a way that may impact security, such as
+  a new way to access sensitive information or a new way to login?
+
+* Does this change involve cryptography or hashing?
+
+* Does this change require the use of sudo or any elevated privileges?
+
+* Does this change involve using or parsing user-provided data? This could
+  be directly at the API level or indirectly such as changes to a cache layer.
+
+* Can this change enable a resource exhaustion attack, such as allowing a
+  single API interaction to consume significant server resources? Some examples
+  of this include launching subprocesses for each connection, or entity
+  expansion attacks in XML.
+
+For more detailed guidance, please see the OpenStack Security Guidelines as
+a reference (https://wiki.openstack.org/wiki/Security/Guidelines).  These
+guidelines are a work in progress and are designed to help you identify
+security best practices.  For further information, feel free to reach out
+to the OpenStack Security Group at openstack-security@lists.openstack.org.
+
+Notifications impact
+--------------------
+
+Please specify any changes to notifications. Be that an extra notification,
+changes to an existing notification, or removing a notification.
+
+Other end user impact
+---------------------
+
+Aside from the API, are there other ways a user will interact with this
+feature?
+
+* Does this change have an impact on python-cyborgclient? What does the user
+  interface there look like?
+
+Performance Impact
+------------------
+
+Describe any potential performance impact on the system, for example
+how often will new code be called, and is there a major change to the calling
+pattern of existing code.
+
+Examples of things to consider here include:
+
+* A periodic task might look like a small addition but if it calls conductor or
+  another service the load is multiplied by the number of nodes in the system.
+
+* Scheduler filters get called once per host for every instance being created,
+  so any latency they introduce is linear with the size of the system.
+
+* A small change in a utility function or a commonly used decorator can have a
+  large impacts on performance.
+
+* Calls which result in a database queries (whether direct or via conductor)
+  can have a profound impact on performance when called in critical sections of
+  the code.
+
+* Will the change include any locking, and if so what considerations are there
+  on holding the lock?
+
+Other deployer impact
+---------------------
+
+Discuss things that will affect how you deploy and configure OpenStack
+that have not already been mentioned, such as:
+
+* What config options are being added? Should they be more generic than
+  proposed (for example a flag that other hypervisor drivers might want to
+  implement as well)? Are the default values ones which will work well in
+  real deployments?
+
+* Is this a change that takes immediate effect after its merged, or is it
+  something that has to be explicitly enabled?
+
+* If this change is a new binary, how would it be deployed?
+
+* Please state anything that those doing continuous deployment, or those
+  upgrading from the previous release, need to be aware of. Also describe
+  any plans to deprecate configuration values or features.  For example, if we
+  change the directory name that instances are stored in, how do we handle
+  instance directories created before the change landed?  Do we move them?  Do
+  we have a special case in the code? Do we assume that the operator will
+  recreate all the instances in their cloud?
+
+Developer impact
+----------------
+
+Discuss things that will affect other developers working on OpenStack,
+such as:
+
+* If the blueprint proposes a change to the driver API, discussion of how
+  other hypervisors would implement the feature is required.
+
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Who is leading the writing of the code? Or is this a blueprint where you're
+throwing it out there to see who picks it up?
+
+If more than one person is working on the implementation, please designate the
+primary author and contact.
+
+Primary assignee:
+  <launchpad-id or None>
+
+Other contributors:
+  <launchpad-id or None>
+
+Work Items
+----------
+
+Work items or tasks -- break the feature up into the things that need to be
+done to implement it. Those parts might end up being done by different people,
+but we're mostly trying to understand the timeline for implementation.
+
+
+Dependencies
+============
+
+* Include specific references to specs and/or blueprints in cyborg, or in other
+  projects, that this one either depends on or is related to.
+
+* If this requires functionality of another project that is not currently used
+  by Cyborg, document that fact.
+
+* Does this feature require any new library dependencies or code otherwise not
+  included in OpenStack? Or does it depend on a specific version of library?
+
+
+Testing
+=======
+
+Please discuss the important scenarios needed to test here, as well as
+specific edge cases we should be ensuring work correctly. For each
+scenario please specify if this requires specialized hardware, a full
+OpenStack environment, or can be simulated inside the Cyborg tree.
+
+Please discuss how the change will be tested. We especially want to know what
+tempest tests will be added. It is assumed that unit test coverage will be
+added so that doesn't need to be mentioned explicitly, but discussion of why
+you think unit tests are sufficient and we don't need to add more tempest
+tests would need to be included.
+
+Is this untestable in gate given current limitations (specific hardware /
+software configurations available)? If so, are there mitigation plans (3rd
+party testing, gate enhancements, etc).
+
+
+Documentation Impact
+====================
+
+Which audiences are affected most by this change, and which documentation
+titles on docs.openstack.org should be updated because of this change? Don't
+repeat details discussed above, but reference them here in the context of
+documentation for multiple audiences. For example, the Operations Guide targets
+cloud operators, and the End User Guide would need to be updated if the change
+offers a new feature available through the CLI or dashboard. If a config option
+changes or is deprecated, note here that the documentation needs to be updated
+to reflect this specification's change.
+
+References
+==========
+
+Please add any useful references here. You are not required to have any
+reference. Moreover, this specification should still make sense when your
+references are unavailable. Examples of what you could include are:
+
+* Links to mailing list or IRC discussions
+
+* Links to notes from a summit session
+
+* Links to relevant research, if appropriate
+
+* Related specifications as appropriate (e.g.  if it's an EC2 thing, link the
+  EC2 docs)
+
+* Anything else you feel it is worthwhile to refer to
+
+
+History
+=======
+
+Optional section intended to be used each time the spec is updated to describe
+new design, API or any database schema updated. Useful to let reader understand
+what's happened along the time.
+
+.. list-table:: Revisions
+   :header-rows: 1
+
+   * - Release Name
+     - Description
+   * - Pike
+     - Introduced