df134df901
Enabled by a new > 0 integer config value, "servers_per_port" in the [DEFAULT] config section for object-server and/or replication server configs. The setting's integer value determines how many different object-server workers handle requests for any single unique local port in the ring. In this mode, the parent swift-object-server process continues to run as the original user (i.e. root if low-port binding is required), binds to all ports as defined in the ring, and forks off the specified number of workers per listen socket. The child, per-port servers drop privileges and behave pretty much how object-server workers always have, except that because the ring has unique ports per disk, the object-servers will only be handling requests for a single disk. The parent process detects dead servers and restarts them (with the correct listen socket), starts missing servers when an updated ring file is found with a device on the server with a new port, and kills extraneous servers when their port is found to no longer be in the ring. The ring files are stat'ed at most every "ring_check_interval" seconds, as configured in the object-server config (same default of 15s). Immediately stopping all swift-object-worker processes still works by sending the parent a SIGTERM. Likewise, a SIGHUP to the parent process still causes the parent process to close all listen sockets and exit, allowing existing children to finish serving their existing requests. The drop_privileges helper function now has an optional param to suppress the setsid() call, which otherwise screws up the child workers' process management. The class method RingData.load() can be told to only load the ring metadata (i.e. everything except replica2part2dev_id) with the optional kwarg, header_only=True. This is used to keep the parent and all forked off workers from unnecessarily having full copies of all storage policy rings in memory. A new helper class, swift.common.storage_policy.BindPortsCache, provides a method to return a set of all device ports in all rings for the server on which it is instantiated (identified by its set of IP addresses). The BindPortsCache instance will track mtimes of ring files, so they are not opened more frequently than necessary. This patch includes enhancements to the probe tests and object-replicator/object-reconstructor config plumbing to allow the probe tests to work correctly both in the "normal" config (same IP but unique ports for each SAIO "server") and a server-per-port setup where each SAIO "server" must have a unique IP address and unique port per disk within each "server". The main probe tests only work with 4 servers and 4 disks, but you can see the difference in the rings for the EC probe tests where there are 2 disks per server for a total of 8 disks. Specifically, swift.common.ring.utils.is_local_device() will ignore the ports when the "my_port" argument is None. Then, object-replicator and object-reconstructor both set self.bind_port to None if server_per_port is enabled. Bonus improvement for IPv6 addresses in is_local_device(). This PR for vagrant-swift-all-in-one will aid in testing this patch: https://github.com/swiftstack/vagrant-swift-all-in-one/pull/16/ Also allow SAIO to answer is_local_device() better; common SAIO setups have multiple "servers" all on the same host with different ports for the different "servers" (which happen to match the IPs specified in the rings for the devices on each of those "servers"). However, you can configure the SAIO to have different localhost IP addresses (e.g. 127.0.0.1, 127.0.0.2, etc.) in the ring and in the servers' config files' bind_ip setting. This new whataremyips() implementation combined with a little plumbing allows is_local_device() to accurately answer, even on an SAIO. In the default case (an unspecified bind_ip defaults to '0.0.0.0') as well as an explict "bind to everything" like '0.0.0.0' or '::', whataremyips() behaves as it always has, returning all IP addresses for the server. Also updated probe tests to handle each "server" in the SAIO having a unique IP address. For some (noisy) benchmarks that show servers_per_port=X is at least as good as the same number of "normal" workers: https://gist.github.com/dbishop/c214f89ca708a6b1624a#file-summary-md Benchmarks showing the benefits of I/O isolation with a small number of slow disks: https://gist.github.com/dbishop/fd0ab067babdecfb07ca#file-results-md If you were wondering what the overhead of threads_per_disk looks like: https://gist.github.com/dbishop/1d14755fedc86a161718#file-tabular_results-md DocImpact Change-Id: I2239a4000b41a7e7cc53465ce794af49d44796c6
364 lines
13 KiB
Python
Executable File
364 lines
13 KiB
Python
Executable File
#!/usr/bin/python -u
|
|
# Copyright (c) 2010-2012 OpenStack Foundation
|
|
#
|
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
|
# you may not use this file except in compliance with the License.
|
|
# You may obtain a copy of the License at
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing, software
|
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
|
# implied.
|
|
# See the License for the specific language governing permissions and
|
|
# limitations under the License.
|
|
|
|
from hashlib import md5
|
|
import unittest
|
|
import uuid
|
|
import random
|
|
import shutil
|
|
from collections import defaultdict
|
|
|
|
from test.probe.common import ECProbeTest
|
|
|
|
from swift.common import direct_client
|
|
from swift.common.storage_policy import EC_POLICY
|
|
from swift.common.manager import Manager
|
|
from swift.obj import reconstructor
|
|
|
|
from swiftclient import client
|
|
|
|
|
|
class Body(object):
|
|
|
|
def __init__(self, total=3.5 * 2 ** 20):
|
|
self.total = total
|
|
self.hasher = md5()
|
|
self.size = 0
|
|
self.chunk = 'test' * 16 * 2 ** 10
|
|
|
|
@property
|
|
def etag(self):
|
|
return self.hasher.hexdigest()
|
|
|
|
def __iter__(self):
|
|
return self
|
|
|
|
def next(self):
|
|
if self.size > self.total:
|
|
raise StopIteration()
|
|
self.size += len(self.chunk)
|
|
self.hasher.update(self.chunk)
|
|
return self.chunk
|
|
|
|
def __next__(self):
|
|
return next(self)
|
|
|
|
|
|
class TestReconstructorRevert(ECProbeTest):
|
|
|
|
def setUp(self):
|
|
super(TestReconstructorRevert, self).setUp()
|
|
self.container_name = 'container-%s' % uuid.uuid4()
|
|
self.object_name = 'object-%s' % uuid.uuid4()
|
|
|
|
# sanity
|
|
self.assertEqual(self.policy.policy_type, EC_POLICY)
|
|
self.reconstructor = Manager(["object-reconstructor"])
|
|
|
|
def proxy_get(self):
|
|
# GET object
|
|
headers, body = client.get_object(self.url, self.token,
|
|
self.container_name,
|
|
self.object_name,
|
|
resp_chunk_size=64 * 2 ** 10)
|
|
resp_checksum = md5()
|
|
for chunk in body:
|
|
resp_checksum.update(chunk)
|
|
return resp_checksum.hexdigest()
|
|
|
|
def direct_get(self, node, part):
|
|
req_headers = {'X-Backend-Storage-Policy-Index': int(self.policy)}
|
|
headers, data = direct_client.direct_get_object(
|
|
node, part, self.account, self.container_name,
|
|
self.object_name, headers=req_headers,
|
|
resp_chunk_size=64 * 2 ** 20)
|
|
hasher = md5()
|
|
for chunk in data:
|
|
hasher.update(chunk)
|
|
return hasher.hexdigest()
|
|
|
|
def test_revert_object(self):
|
|
# create EC container
|
|
headers = {'X-Storage-Policy': self.policy.name}
|
|
client.put_container(self.url, self.token, self.container_name,
|
|
headers=headers)
|
|
|
|
# get our node lists
|
|
opart, onodes = self.object_ring.get_nodes(
|
|
self.account, self.container_name, self.object_name)
|
|
hnodes = self.object_ring.get_more_nodes(opart)
|
|
|
|
# kill 2 a parity count number of primary nodes so we can
|
|
# force data onto handoffs, we do that by renaming dev dirs
|
|
# to induce 507
|
|
p_dev1 = self.device_dir('object', onodes[0])
|
|
p_dev2 = self.device_dir('object', onodes[1])
|
|
self.kill_drive(p_dev1)
|
|
self.kill_drive(p_dev2)
|
|
|
|
# PUT object
|
|
contents = Body()
|
|
headers = {'x-object-meta-foo': 'meta-foo'}
|
|
headers_post = {'x-object-meta-bar': 'meta-bar'}
|
|
client.put_object(self.url, self.token, self.container_name,
|
|
self.object_name, contents=contents,
|
|
headers=headers)
|
|
client.post_object(self.url, self.token, self.container_name,
|
|
self.object_name, headers=headers_post)
|
|
del headers_post['X-Auth-Token'] # WTF, where did this come from?
|
|
|
|
# these primaries can't servce the data any more, we expect 507
|
|
# here and not 404 because we're using mount_check to kill nodes
|
|
for onode in (onodes[0], onodes[1]):
|
|
try:
|
|
self.direct_get(onode, opart)
|
|
except direct_client.DirectClientException as err:
|
|
self.assertEqual(err.http_status, 507)
|
|
else:
|
|
self.fail('Node data on %r was not fully destoryed!' %
|
|
(onode,))
|
|
|
|
# now take out another primary
|
|
p_dev3 = self.device_dir('object', onodes[2])
|
|
self.kill_drive(p_dev3)
|
|
|
|
# this node can't servce the data any more
|
|
try:
|
|
self.direct_get(onodes[2], opart)
|
|
except direct_client.DirectClientException as err:
|
|
self.assertEqual(err.http_status, 507)
|
|
else:
|
|
self.fail('Node data on %r was not fully destoryed!' %
|
|
(onode,))
|
|
|
|
# make sure we can still GET the object and its correct
|
|
# we're now pulling from handoffs and reconstructing
|
|
etag = self.proxy_get()
|
|
self.assertEqual(etag, contents.etag)
|
|
|
|
# rename the dev dirs so they don't 507 anymore
|
|
self.revive_drive(p_dev1)
|
|
self.revive_drive(p_dev2)
|
|
self.revive_drive(p_dev3)
|
|
|
|
# fire up reconstructor on handoff nodes only
|
|
for hnode in hnodes:
|
|
hnode_id = (hnode['port'] - 6000) / 10
|
|
self.reconstructor.once(number=hnode_id)
|
|
|
|
# first threee primaries have data again
|
|
for onode in (onodes[0], onodes[2]):
|
|
self.direct_get(onode, opart)
|
|
|
|
# check meta
|
|
meta = client.head_object(self.url, self.token,
|
|
self.container_name,
|
|
self.object_name)
|
|
for key in headers_post:
|
|
self.assertTrue(key in meta)
|
|
self.assertEqual(meta[key], headers_post[key])
|
|
|
|
# handoffs are empty
|
|
for hnode in hnodes:
|
|
try:
|
|
self.direct_get(hnode, opart)
|
|
except direct_client.DirectClientException as err:
|
|
self.assertEqual(err.http_status, 404)
|
|
else:
|
|
self.fail('Node data on %r was not fully destoryed!' %
|
|
(hnode,))
|
|
|
|
def test_delete_propogate(self):
|
|
# create EC container
|
|
headers = {'X-Storage-Policy': self.policy.name}
|
|
client.put_container(self.url, self.token, self.container_name,
|
|
headers=headers)
|
|
|
|
# get our node lists
|
|
opart, onodes = self.object_ring.get_nodes(
|
|
self.account, self.container_name, self.object_name)
|
|
hnodes = self.object_ring.get_more_nodes(opart)
|
|
p_dev2 = self.device_dir('object', onodes[1])
|
|
|
|
# PUT object
|
|
contents = Body()
|
|
client.put_object(self.url, self.token, self.container_name,
|
|
self.object_name, contents=contents)
|
|
|
|
# now lets shut one down
|
|
self.kill_drive(p_dev2)
|
|
|
|
# delete on the ones that are left
|
|
client.delete_object(self.url, self.token,
|
|
self.container_name,
|
|
self.object_name)
|
|
|
|
# spot check a node
|
|
try:
|
|
self.direct_get(onodes[0], opart)
|
|
except direct_client.DirectClientException as err:
|
|
self.assertEqual(err.http_status, 404)
|
|
else:
|
|
self.fail('Node data on %r was not fully destoryed!' %
|
|
(onodes[0],))
|
|
|
|
# enable the first node again
|
|
self.revive_drive(p_dev2)
|
|
|
|
# propogate the delete...
|
|
# fire up reconstructor on handoff nodes only
|
|
for hnode in hnodes:
|
|
hnode_id = (hnode['port'] - 6000) / 10
|
|
self.reconstructor.once(number=hnode_id)
|
|
|
|
# check the first node to make sure its gone
|
|
try:
|
|
self.direct_get(onodes[1], opart)
|
|
except direct_client.DirectClientException as err:
|
|
self.assertEqual(err.http_status, 404)
|
|
else:
|
|
self.fail('Node data on %r was not fully destoryed!' %
|
|
(onodes[0]))
|
|
|
|
# make sure proxy get can't find it
|
|
try:
|
|
self.proxy_get()
|
|
except Exception as err:
|
|
self.assertEqual(err.http_status, 404)
|
|
else:
|
|
self.fail('Node data on %r was not fully destoryed!' %
|
|
(onodes[0]))
|
|
|
|
def test_reconstruct_from_reverted_fragment_archive(self):
|
|
headers = {'X-Storage-Policy': self.policy.name}
|
|
client.put_container(self.url, self.token, self.container_name,
|
|
headers=headers)
|
|
|
|
# get our node lists
|
|
opart, onodes = self.object_ring.get_nodes(
|
|
self.account, self.container_name, self.object_name)
|
|
|
|
# find a primary server that only has one of it's devices in the
|
|
# primary node list
|
|
group_nodes_by_config = defaultdict(list)
|
|
for n in onodes:
|
|
group_nodes_by_config[self.config_number(n)].append(n)
|
|
for config_number, node_list in group_nodes_by_config.items():
|
|
if len(node_list) == 1:
|
|
break
|
|
else:
|
|
self.fail('ring balancing did not use all available nodes')
|
|
primary_node = node_list[0]
|
|
|
|
# ... and 507 it's device
|
|
primary_device = self.device_dir('object', primary_node)
|
|
self.kill_drive(primary_device)
|
|
|
|
# PUT object
|
|
contents = Body()
|
|
etag = client.put_object(self.url, self.token, self.container_name,
|
|
self.object_name, contents=contents)
|
|
self.assertEqual(contents.etag, etag)
|
|
|
|
# fix the primary device and sanity GET
|
|
self.revive_drive(primary_device)
|
|
self.assertEqual(etag, self.proxy_get())
|
|
|
|
# find a handoff holding the fragment
|
|
for hnode in self.object_ring.get_more_nodes(opart):
|
|
try:
|
|
reverted_fragment_etag = self.direct_get(hnode, opart)
|
|
except direct_client.DirectClientException as err:
|
|
if err.http_status != 404:
|
|
raise
|
|
else:
|
|
break
|
|
else:
|
|
self.fail('Unable to find handoff fragment!')
|
|
|
|
# we'll force the handoff device to revert instead of potentially
|
|
# racing with rebuild by deleting any other fragments that may be on
|
|
# the same server
|
|
handoff_fragment_etag = None
|
|
for node in onodes:
|
|
if self.is_local_to(node, hnode):
|
|
# we'll keep track of the etag of this fragment we're removing
|
|
# in case we need it later (queue forshadowing music)...
|
|
try:
|
|
handoff_fragment_etag = self.direct_get(node, opart)
|
|
except direct_client.DirectClientException as err:
|
|
if err.http_status != 404:
|
|
raise
|
|
# this just means our handoff device was on the same
|
|
# machine as the primary!
|
|
continue
|
|
# use the primary nodes device - not the hnode device
|
|
part_dir = self.storage_dir('object', node, part=opart)
|
|
shutil.rmtree(part_dir, True)
|
|
|
|
# revert from handoff device with reconstructor
|
|
self.reconstructor.once(number=self.config_number(hnode))
|
|
|
|
# verify fragment reverted to primary server
|
|
self.assertEqual(reverted_fragment_etag,
|
|
self.direct_get(primary_node, opart))
|
|
|
|
# now we'll remove some data on one of the primary node's partners
|
|
partner = random.choice(reconstructor._get_partners(
|
|
primary_node['index'], onodes))
|
|
|
|
try:
|
|
rebuilt_fragment_etag = self.direct_get(partner, opart)
|
|
except direct_client.DirectClientException as err:
|
|
if err.http_status != 404:
|
|
raise
|
|
# partner already had it's fragment removed
|
|
if (handoff_fragment_etag is not None and
|
|
self.is_local_to(hnode, partner)):
|
|
# oh, well that makes sense then...
|
|
rebuilt_fragment_etag = handoff_fragment_etag
|
|
else:
|
|
# I wonder what happened?
|
|
self.fail('Partner inexplicably missing fragment!')
|
|
part_dir = self.storage_dir('object', partner, part=opart)
|
|
shutil.rmtree(part_dir, True)
|
|
|
|
# sanity, it's gone
|
|
try:
|
|
self.direct_get(partner, opart)
|
|
except direct_client.DirectClientException as err:
|
|
if err.http_status != 404:
|
|
raise
|
|
else:
|
|
self.fail('successful GET of removed partner fragment archive!?')
|
|
|
|
# and force the primary node to do a rebuild
|
|
self.reconstructor.once(number=self.config_number(primary_node))
|
|
|
|
# and validate the partners rebuilt_fragment_etag
|
|
try:
|
|
self.assertEqual(rebuilt_fragment_etag,
|
|
self.direct_get(partner, opart))
|
|
except direct_client.DirectClientException as err:
|
|
if err.http_status != 404:
|
|
raise
|
|
else:
|
|
self.fail('Did not find rebuilt fragment on partner node')
|
|
|
|
|
|
if __name__ == "__main__":
|
|
unittest.main()
|