KATO Tomoyuki 8760c94427 [ops-guide] Cleanup maintenance chapter

Change-Id: I421caf2a12ab192d4df6d5c197e2c5dfb1c9c9bb
Implements: blueprint ops-guide-rst

2016-05-17 13:50:58 +00:00

1.4 KiB

Raw Blame History

Handling a Complete Failure

A common way of dealing with the recovery from a full system failure, such as a power outage of a data center, is to assign each service a priority, and restore in order. table_example_priority shows an example.

Table. Example service restoration priority list
Priority	Services
1	Internal network connectivity
2	Backing storage services
3	Public network connectivity for user virtual machines
4	`nova-compute`, `nova-network`, cinder hosts
5	User virtual machines
10	Message queue and database services
15	Keystone services
20	`cinder-scheduler`
21	Image Catalog and Delivery services
22	`nova-scheduler` services
98	`cinder-api`
99	`nova-api` services
100	Dashboard node

Use this example priority list to ensure that user-affected services are restored as soon as possible, but not before a stable environment is in place. Of course, despite being listed as a single-line item, each step requires significant work. For example, just after starting the database, you should check its integrity, or, after starting the nova services, you should verify that the hypervisor matches the database and fix any mismatches.

1.4 KiB Raw Blame History

Handling a Complete Failure

1.4 KiB

Raw Blame History