Configuring Compute Service Groups

Configuring Compute Service Groups To effectively manage and utilize compute nodes, the Compute service must know their statuses. For example, when a user launches a new VM, the Compute scheduler should send the request to a live node (with enough capacity too, of course). From the Grizzly release and later, the Compute service queries the ServiceGroup API to get the node liveness information. When a compute worker (running the nova-compute daemon) starts, it calls the join API to join the compute group, so that every service that is interested in the information (for example, the scheduler) can query the group membership or the status of a particular node. Internally, the ServiceGroup client driver automatically updates the compute worker status. The following drivers are implemented: database and ZooKeeper. Further drivers are in review or development, such as memcache.

Database ServiceGroup driver Compute uses the database driver, which is the default driver, to track node liveness. In a compute worker, this driver periodically sends a db update command to the database, saying I'm OK with a timestamp. A pre-defined timeout (service_down_time) determines if a node is dead. The driver has limitations, which may or may not be an issue for you, depending on your setup. The more compute worker nodes that you have, the more pressure you put on the database. By default, the timeout is 60 seconds so it might take some time to detect node failures. You could reduce the timeout value, but you must also make the DB update more frequently, which again increases the DB workload. Fundamentally, the data that describes whether the node is alive is "transient" — After a few seconds, this data is obsolete. Other data in the database is persistent, such as the entries that describe who owns which VMs. However, because this data is stored in the same database, is treated the same way. The ServiceGroup abstraction aims to treat them separately.

ZooKeeper ServiceGroup driver The ZooKeeper ServiceGroup driver works by using ZooKeeper ephemeral nodes. ZooKeeper, in contrast to databases, is a distributed system. Its load is divided among several servers. At a compute worker node, after establishing a ZooKeeper session, it creates an ephemeral znode in the group directory. Ephemeral znodes have the same lifespan as the session. If the worker node or the nova-compute daemon crashes, or a network partition is in place between the worker and the ZooKeeper server quorums, the ephemeral znodes are removed automatically. The driver gets the group membership by running the ls command in the group directory. To use the ZooKeeper driver, you must install ZooKeeper servers and client libraries. Setting up ZooKeeper servers is outside the scope of this article. For the rest of the article, assume these servers are installed, and their addresses and ports are 192.168.2.1:2181, 192.168.2.2:2181, 192.168.2.3:2181. To use ZooKeeper, you must install client-side Python libraries on every nova node: python-zookeeper – the official Zookeeper Python binding and evzookeeper – the library to make the binding work with the eventlet threading model. The relevant configuration snippet in the /etc/nova/nova.conf file on every node is: servicegroup_driver="zk" [zookeeper] address="192.168.2.1:2181,192.168.2.2:2181,192.168.2.3:2181"