============================== Deploy Kolla images with Mesos ============================== https://blueprints.launchpad.net/kolla/+spec/mesos Kolla deploys the containers using Ansible, however this is just one way to deploy the containers. For example TripleO deploys Kolla containers using Heat in-guest agents. This specification defines the support for deploying Kolla containers using Mesos and Marathon. What is Mesos? From (http://mesos.apache.org/) Mesos "provides efficient resource isolation and sharing across distributed applications, or frameworks". The software enables resource sharing in a fine-grained manner, improving cluster utilization. What is Marathon? From (https://mesosphere.github.io/marathon/): "A cluster-wide init and control system for services in cgroups or Docker containers". Adding Mesos/Marathon support to Kolla will enable those interested in deploying OpenStack with Mesos to contribute to the Kolla community in a more direct way. Problem description =================== The current deployment (Ansible) is done somewhat serially, meaning that some services depend on others, and the deployment is controlled by the command line (a user). In addition to deployment, Mesos/Marathon provides the following features that will eventually be used: - life-cycle management: like service monitoring, restart, scaling and rolling\restarts\upgrades - constraints [1]: the Marathon scheduler will be used to more effectively place containers (esp. during scaling/recovery) - integration with core infrastructure services like DNS, Load Balancing, Service Discovery and Service components. In order to reuse a large amount of functionality, it would be best to use an existing framework that provides a proven stable and mature solution. Given that Mesos/Marathon is used and tested at scale by many large companies, it will give operators the confidence to adopt OpenStack to meet any scaling requirements they need. Marathon [2] will be used to manage the containers. Marathon is a framework that runs on top of Mesos and it is for long running services. Part of this change is to start all the containers at the same time (in parallel) so that there are as few dependencies from the deployment tool’s point of view. This should enable a couple of things: - faster initial deployment - reduce unnecessary restarts during upgrades - make each container more self sufficient Proposed change =============== - Add a deployment specific git repo (kolla-mesos) to contain the Mesos/Marathon specific deployment code and boot strapping. - Enhance Kolla container API (config.json) to permit loading of custom startup script while maintaining immutability with copy_once. - Implement an all in one (AIO) basic OpenStack - Implement a separate controller/compute setup similar to the Ansible one. - Throughout add docs to assist users and contributors/reviewers. Bootstrapping: -------------- At first, Mesos/Marathon/Zookeeper bootstrapping will be done by setting up docker container. Later, bootstrapping will be handled by Ironic/PXE (the aim is to be practical and do what is easiest for the AIO). Dependancy management --------------------- Instead of the serialising the dependant steps, each container is started and only actually starts the service if the requirements are fulfilled. These dependencies will come in the form of: - service discovery (service X needs service Y running) Note: that Marathon DNS and LB can be self-configured based on service registry information. To achieve this the container also needs to register itself once it has started. - checking to see if service configuration is complete (has keystone got the service user that is required, is the DB schema complete, etc..) Use Zookeeper to watch for these configuration steps. One time tasks -------------- Ansible runs a number of scripts to setup the database, keystone etc. These can be run as a Mesos Executor (command line run in the container of choice). Security impact --------------- Mesos and Marathon are mature products used by various companies in production. The central configuration storage will require careful security risk assessment. The deployed OpenStack’s security should not be affected by the deployment tool. Performance Impact ------------------ Given that the Mesos slaves are distributed and all containers will be started in parallel, the deployment *may* be faster, though this is not the main focus. Alternatives ------------ Kubernetes was evaluated by the Kolla team 6 months ago and found to not work at that time as it did not support net=host and pid=host features of docker. Since then it has developed these features, if Mesos/Marathon fails to produce results, then going back to kubernetes is an option. However at the time of writing this Mesos/Marathon was deemed to be more mature and stable. Implementation ============== Primary Assignee(s) ----------- Angus Salkeld (asalkeld) Kirill Proskurin (kproskurin) Michal Rostecki (nihilifer) Other contributor(s): Harm Weites (harmw) Jeff Peeler (jpeeler) Michal Jastrzebski (inc0) Sam Yaple (SamYaple) Steven Dake (sdake) <Please add your name here if you are getting involved in kolla-mesos> Milestones ---------- Target Milestone for completion: mitaka Work Items ---------- 1. Allow a custom startup script to run (change in Kolla) 2. Add startup scripts to kolla-mesos to read config from zookeeper instead of bindmounted directory. Propose oslo.config changes to use this method (oslo work done in parallel, initially this will be done in the startup script). 3. Add startup scripts for service discovery so that services only start once their needs are fulfilled. a. register a service once a service is running b. wait for dependent services if they are needed before starting a service. c. DNS and LB self-configuration based on service registry information 5. Add bootstrapping code to install Marathon, Zookeeper, Mesos master and slave. 6. Add calls to to marathon to deploy containers. 7. Add support for kolla-mesos to kolla-cli. Testing ======= Functional tests will be implemented in the OpenStack check/gating system to automatically check that the Mesos/Marathon deployment works for an AIO environment. Documentation Impact ==================== A quick start guide will be written to explain how to deploy. A develop guide will be written on how to contribute and how the deployment works. References ========== - [1] https://mesosphere.github.io/marathon/docs/constraints.html - [2] https://mesosphere.github.io/marathon/ - http://radar.oreilly.com/2015/10/swarm-v-fleet-v-kubernetes-v-mesos.html - https://www.wehkamplabs.com/blog/2015/10/15/applying-consul-within-the-blaze-microservices-platform/