37 Commits

Author SHA1 Message Date
Alistair Coles
65e1de29d6 Fix shard_max_row in ContainerBroker.get_replication_info()
ContainerBroker adds the shard_max_row item to the
get_replication_info result by querying the db for the shard ranges
table max rowid. However, the wrong table name was being used in the
db query such that the value was always -1. This bug was benign
because the value of shard_max_row is not currently used.

Noticed while reviewing [1] which does make use of the shard_max_row
*key* in replication info.

[1] Related-Change: I7231e8af310e268484f2075f0194b7783cf1c3ea

Change-Id: I9e733e301894f1ffff4a1092926cc0df8419c5b5
2018-06-13 12:20:17 +01:00
Zuul
7dfe61bb4c Merge "Remove exclude_states from get_shard_ranges" 2018-05-22 13:04:30 +00:00
Tim Burke
9530ab20e3 Pre-storage-policy-index tests shouldn't have shard_range tables
Change-Id: Ib6aca2375a196d319bd955d5c458c37671f9e68d
2018-05-21 13:36:35 -07:00
Tim Burke
cc565db753 Remove exclude_states from get_shard_ranges
It was never actually used -- YAGNI.

Change-Id: I2f7d1bc698ff4c0734ab48eb29e252d6acb1abc6
2018-05-21 11:24:39 -07:00
Matthew Oliver
2641814010 Add sharder daemon, manage_shard_ranges tool and probe tests
The sharder daemon visits container dbs and when necessary executes
the sharding workflow on the db.

The workflow is, in overview:

- perform an audit of the container for sharding purposes.

- move any misplaced objects that do not belong in the container
  to their correct shard.

- move shard ranges from FOUND state to CREATED state by creating
  shard containers.

- move shard ranges from CREATED to CLEAVED state by cleaving objects
  to shard dbs and replicating those dbs. By default this is done in
  batches of 2 shard ranges per visit.

Additionally, when the auto_shard option is True (NOT yet recommeneded
in production), the sharder will identify shard ranges for containers
that have exceeded the threshold for sharding, and will also manage
the sharding and shrinking of shard containers.

The manage_shard_ranges tool provides a means to manually identify
shard ranges and merge them to a container in order to trigger
sharding. This is currently the recommended way to shard a container.

Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>

Change-Id: I7f192209d4d5580f5a0aa6838f9f04e436cf6b1f
2018-05-18 18:48:13 +01:00
Alistair Coles
14af38a899 Add support for sharding in ContainerBroker
With this patch the ContainerBroker gains several new features:

1. A shard_ranges table to persist ShardRange data, along with
methods to merge and access ShardRange instances to that table,
and to remove expired shard ranges.

2. The ability to create a fresh db file to replace the existing db
file. Fresh db files are named using the hash of the container path
plus an epoch which is a serialized Timestamp value, in the form:

  <hash>_<epoch>.db

During sharding both the fresh and retiring db files co-exist on
disk. The ContainerBroker is now able to choose the newest on disk db
file when instantiated. It also provides a method (get_brokers()) to
gain access to broker instance for either on disk file.

3. Methods to access the current state of the on disk db files i.e.
UNSHARDED (old file only), SHARDING (fresh and retiring files), or
SHARDED (fresh file only with shard ranges).

Container replication is also modified:

1. shard ranges are replicated between container db peers. Unlike
objects, shard ranges are both pushed and pulled during a REPLICATE
event.

2. If a container db is capable of being sharded (i.e. it has a set of
shard ranges) then it will no longer attempt to replicate objects to
its peers. Object record durability is achieved by sharding rather than
peer to peer replication.

Co-Authored-By: Matthew Oliver <matt@oliver.net.au>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>

Change-Id: Ie4d2816259e6c25c346976e181fb9d350f947190
2018-05-18 18:42:38 +01:00
Alistair Coles
9d742b85ad Refactoring, test infrastructure changes and cleanup
...in preparation for the container sharding feature.

Co-Authored-By: Matthew Oliver <matt@oliver.net.au>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>

Change-Id: I4455677abb114a645cff93cd41b394d227e805de
2018-05-15 18:18:25 +01:00
Jenkins
83b62b4f39 Merge "Add Timestamp.now() helper" 2017-07-18 03:27:50 +00:00
Alistair Coles
7a9269bca3 Fix unit test failing when swift.conf has default policy index >10
In unit/container/test_backend.py test_policy_stat_tracking in
classes TestContainerBroker[BeforeMetadata|BeforeSPI|BeforeXSync]
fails if the default policy in /etc/swift/swift.conf has an index >10.

Those classes monkey patch the container broker to pre-storage-policy
index behaviour, so that it's policy index will always be 0. The test
fails with a KeyError when asserting that the broker should have stats
for the POLICIES.default index even when no object have been PUT with
that policy index. When the default policy in swift.conf has index
>10, that is neither the broker's policy index (0) nor the policy
index of any other object that has been PUT during the test.

The test need a patch_policies decorator to remove the coupling with
swift.conf policies. However, the assertion that the broker has stats
for it's policy index even when no objects were PUT to that index is
then extremely unlikely to ever be tested, because the broker's
default policy index of 0 is very likely to have used for a PUT
object. So this patch also repeats that assertion before any object
have been PUT

Closes-Bug: #1687029
Change-Id: I8b3678dac83f7329d835059c9973b994bc975a33
2017-05-01 09:24:00 +01:00
Tim Burke
85d6cd30be Add Timestamp.now() helper
Often, we want the current timestamp. May as well improve the ergonomics
a bit and provide a class method for it.

Change-Id: I3581c635c094a8c4339e9b770331a03eab704074
2017-04-27 14:19:00 -07:00
Luong Anh Tuan
19a684dded Fix using filter() to meet python2,3
As mentioned in link[1], if we need filter on python3,
Raplace filter(lambda obj: test(obj), data) with:
[obj for obj in data if test(obj)].

[1] https://wiki.openstack.org/wiki/Python3

Change-Id: Ia1ea2ec89e4beb957a4cb358b0d0cef970f23e0a
2016-09-22 07:32:38 +07:00
Lokesh S
49f250736d Python3 fixes generator object issue
Fixes generator' object has no attribute
'next' issues

Change-Id: I1f21eaed0ae7062073438503d3f6860d8b4f36c8
2016-07-20 19:06:45 +00:00
Jenkins
60c127575b Merge "swift-[account|container]-info when disk is full" 2016-04-22 17:31:09 +00:00
Jenkins
f227072974 Merge "Correctly handle keys starting with the delimiter." 2016-04-01 02:02:44 +00:00
Janie Richling
e97c4f794d swift-[account|container]-info when disk is full
Extended the use of the DatabaseBroker "stale_reads_ok" flag to the
AccountBroker and ContainerBroker.  Now checks for an sqlite3 error
from the _commit_puts call that processes the pending files.

If this error is raised, then the stale_reads_ok flag will be checked
to determine how to proceed as opposed to simply raising.

The first time that print_info is attempted, the flag will be
false, but swift-[account|container]-info will check for the
raised exception.  If it was raised, then a warning is reported
that the data may be stale, and another attempt will be
made using the stale_reads_ok=True flag.

Change-Id: I761526eef62327888c865d87a9caafa3e7eabab6
Closes-Bug: 1531302
2016-03-24 20:11:24 -05:00
Alistair Coles
e91de49d68 Update container on fast-POST
This patch makes a number of changes to enable content-type
metadata to be updated when using the fast-POST mode of
operation, as proposed in the associated spec [1].

* the object server and diskfile are modified to allow
  content-type to be updated by a POST and the updated value
  to be stored in .meta files.

* the object server accepts PUTs and DELETEs with older
  timestamps than existing .meta files. This is to be
  consistent with replication that will leave a later .meta
  file in place when replicating a .data file.

* the diskfile interface is modified to provide accessor
  methods for the content-type and its timestamp.

* the naming of .meta files is modified to encode two
  timestamps when the .meta file contains a content-type value
  that was set prior to the latest metadata update; this
  enables consistency to be achieved when rsync is used for
  replication.

* ssync is modified to sync meta files when content-type
  differs between local and remote copies of objects.

* the object server issues container updates when handling
  POST requests, notifying the container server of the current
  immutable metadata (etag, size, hash, swift_bytes),
  content-type with their respective timestamps, and the
  mutable metadata timestamp.

* the container server maintains the most recently reported
  values for immutable metadata, content-type and mutable
  metadata, each with their respective timestamps, in a single
  db row.

* new probe tests verify that replication achieves eventual
  consistency of containers and objects after discrete updates
  to content-type and mutable metadata, and that container-sync
  sync's objects after fast-post updates.

[1] spec change-id: I60688efc3df692d3a39557114dca8c5490f7837e

Change-Id: Ia597cd460bb5fd40aa92e886e3e18a7542603d01
2016-03-03 14:25:10 +00:00
Timur Alperovich
432e280aef Correctly handle keys starting with the delimiter.
When processing keys where the names start with the delimiter
character, swift should list only the delimiter character. To get the
list of nested keys, the caller should also supply the prefix which is
equal to the delimiter.

Added a functional test and unit tests to verify this behavior.

Fixes Bug: 1475018

Change-Id: I27701a31bfa22842c272b7781738e8c546b82cbc
2016-01-06 12:29:20 -08:00
Richard Hawkins
9d7f71d575 Modify functional tests to use ostestr/testr
Defcore uses Tempest, which uses Test Repository.
This change makes it easier for Defcore to pull functional
tests from Swift and run them.  Additionally, using testr
allows tests to be run in parallel.

Concurrency set to 1 for now, >1 causes failures for
reasons that are still TBD.

With switch to ostestr all the server logs are being sent to stdout
which makes it completely unreadable. Suppressing the logs by default
now with a flag to enable it if desired.

Co-Authored-By: John Dickinson <me@not.mn>
Co-Authored-By: Robert Collins <rbtcollins@hpe.com>
Co-Authored-By: Matthew Oliver <matt@oliver.net.au>
Co-Authored-By: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>

Change-Id: I53ef4a116996a772cf1f3abc2eb0ad60047322d5
Related-Bug: 1177924
2015-12-15 22:30:44 +00:00
Tim Burke
41897d96a7 Reverse-listings follow-up
* With the end_prefix changes in the original commit, we no longer need
   the `or not name.startswith(prefix)` check.
 * Improve test coverage of reverse path listings.

Change-Id: Iaa7d4b83647c3c150be95f88cb3cc9e4f0e33979
2015-11-24 08:53:51 -08:00
Matthew Oliver
7c1e6cd583 Add container and account reverse listings
This change adds the ability to tell the container or account server to
reverse their listings. This is done by sending a reverse=TRUE_VALUE,

Where TRUE_VALUE is one of the values true can be in common/utils:

  TRUE_VALUES = set(('true', '1', 'yes', 'on', 't', 'y'))

For example:

  curl -i -X GET -H "X-Auth-Token: $TOKEN" $STORAGE_URL/c/?reverse=on

I borrowed the swapping of the markers code from Kevin's old change,
thanks Kevin. And Tim Burke added some real nuggets of awesomeness.

DocImpact
Co-Authored-By: Kevin McDonald <kmcdonald@softlayer.com>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Implements: blueprint reverse-object-listing

Change-Id: I5eb655360ac95042877da26d18707aebc11c02f6
2015-11-24 15:08:45 +00:00
janonymous
1882801be1 pep8 fix: assertNotEquals -> assertNotEqual
assertNotEquals is deprecated in py3

Change-Id: Ib611351987bed1199fb8f73a750955a61d022d0a
2015-10-12 07:40:07 +00:00
janonymous
ed3aec2146 pep8 fix: assertEquals -> assertEqual
assertEquals is deprecated in py3 in
dir: test/unit/container/*

Change-Id: I3333022ed63ce03198bc73147246d91d2442a440
2015-08-06 00:18:52 +05:30
Victor Stinner
a0db56dcde Fix pep8 E265 warning of hacking 0.10
Fix the warning E265 "block comment should start with '# '" added in pep
1.5.

Change-Id: Ib57282e958be9c7cddffc7bca34fbbf1d4c460fd
2015-07-30 09:33:18 +02:00
janonymous
cd7b2db550 unit tests: Replace "self.assert_" by "self.assertTrue"
The assert_() method is deprecated and can be safely replaced by assertTrue().
This patch makes sure that running the tests does not create undesired
warnings.

Change-Id: I0602ba39ef93263386644ee68088d5f65fcb4a71
2015-07-21 19:23:00 +05:30
Victor Stinner
e5c962a28c Replace xrange() with six.moves.range()
Patch generated by the xrange operation of the sixer tool:
https://pypi.python.org/pypi/sixer

Manual changes:

* Fix indentation for pep8 checks
* Fix TestGreenthreadSafeIterator.test_access_is_serialized of
  test.unit.common.test_utils:
  replace range(1, 11) with list(range(1, 11))
* Fix UnsafeXrange docstring, revert change

Change-Id: Icb7e26135c5e57b5302b8bfe066b33cafe69fe4d
2015-06-23 07:29:15 +00:00
janonymous
09e7477a39 Replace it.next() with next(it) for py3 compat
The Python 2 next() method of iterators was renamed to __next__() on
Python 3. Use the builtin next() function instead which works on Python
2 and Python 3.

Change-Id: Ic948bc574b58f1d28c5c58e3985906dee17fa51d
2015-06-15 22:10:45 +05:30
Kota Tsuyuzaki
16b435a4a8 Fix ContainerBroker to use policy-0 in default
Fix ContainerBroker to initialize as policy-0 on policy_stat
table in default when storage_policy_index argument is NOT given.

Current ContainerBroker makes policy-1 stats in default because
the "None" value will be passed through to the last function of
db access query (i.e. a query like as "INSERT INTO policy_stat
(storage_policy_index) VALUES (None)" will appear) which results
in a row "(1, 0, 0)" (the first value is the policy index) by
the PRIMARY KEY constraint on sqlite.

In worst case, container db keeps two policies, and then, ContainerBroker.get_info
might return invalid (non-touched) policy_stat information as container information.
(See tests in detail)

When using ContainerBroker with no storage_policy_index argument,
it should always act policy-0 simply.

Note that this patch doesn't affect immediately Swift behavior because
current swift ensures to use policy-0 on "Container-Server" when invalid
policy (includes None) is given. However, we should recheck also in
ContainerBroker for safety to prevent the unfortunate behavior above.

Change-Id: If64f0c94c069a2cc3140c99f21b8d371c183e28a
2015-02-23 01:13:40 -08:00
Michael Barton
eaab4d3fd6 container.merge_items bug
When replicated container entries get round-tripped through json, they wind up
with unicode objects for names.  This causes equality checks to fail against
container entries, and you can wind up with duplicate records.  My bad.

Change-Id: I3aee2ad8dbd3a617efe37e887cfb902a3e4a1646
2014-09-18 12:36:26 -07:00
Clay Gerrard
c1dc2fa624 Add two vector timestamps
The normalized form of the X-Timestamp header looks like a float with a fixed
width to ensure stable string sorting - normalized timestamps look like
"1402464677.04188"

To support overwrites of existing data without modifying the original
timestamp but still maintain consistency a second internal offset
vector is append to the normalized timestamp form which compares and
sorts greater than the fixed width float format but less than a newer
timestamp.  The internalized format of timestamps looks like
"1402464677.04188_0000000000000000" - the portion after the underscore
is the offset and is a formatted hexadecimal integer.

The internalized form is not exposed to clients in responses from Swift.
Normal client operations will not create a timestamp with an offset.

The Timestamp class in common.utils supports internalized and normalized
formatting of timestamps and also comparison of timestamp values.  When the
offset value of a Timestamp is 0 - it's considered insignificant and need not
be represented in the string format; to support backwards compatibility during
a Swift upgrade the internalized and normalized form of a Timestamp with an
insignificant offset are identical.  When a timestamp includes an offset it
will always be represented in the internalized form, but is still excluded
from the normalized form.  Timestamps with an equivalent timestamp portion
(the float part) will compare and order by their offset.  Timestamps with a
greater timestamp portion will always compare and order greater than a
Timestamp with a lesser timestamp regardless of it's offset.  String
comparison and ordering is guaranteed for the internalized string format, and
is backwards compatible for normalized timestamps which do not include an
offset.

The reconciler currently uses a offset bump to ensure that objects can move to
the wrong storage policy and be moved back.  This use-case is valid because
the content represented by the user-facing timestamp is not modified in way.
Future consumers of the offset vector of timestamps should be mindful of HTTP
semantics of If-Modified and take care to avoid deviation in the response from
the object server without an accompanying change to the user facing timestamp.

DocImpact
Implements: blueprint storage-policies
Change-Id: Id85c960b126ec919a481dc62469bf172b7fb8549
2014-06-19 10:18:06 -07:00
Pete Zaitcev
b02f0db126 Refactoring storage policies merge_timestamps
* base implementation of is_deleted phrased to use _is_deleted
 * wrap pre-conn coded _is_deleted inside a transation for merge_timestamps

Implements: blueprint storage-policies
Change-Id: I6a948908c3e45b70707981d87171cb2cb910fe1e
2014-06-18 20:57:09 -07:00
Clay Gerrard
4321bb0af6 Add Storage Policy support to Containers
Containers now have a storage policy index associated with them,
stored in the container_stat table. This index is only settable at
container creation time (PUT request), and cannot be changed without
deleting and recreating the container. This is because a container's
policy index will apply to all its objects, so changing a container's
policy index would require moving large amounts of object data
around. If a user wants to change the policy for data in a container,
they must create a new container with the desired policy and move the
data over.

Keep status_changed_at up-to-date with status changes.

In particular during container recreation and replication.

When a container-server receives a PUT for a deleted database an extra UPDATE
is issued against the container_stat table to notate the x-timestamp of the
request.

During replication if merge_timestamps causes a container's status to change
(from DELETED to ACTIVE or vice-versa) the status_changed_at field is set to
the current time.

Accurate reporting of status_changed_at is useful for container replication
forensics and allows resolution of "set on create" attributes like the
upcoming storage_policy_index.

Expose Backend container info on deleted containers.

Include basic container info in backend headers on 404 responses from the
container server.  Default empty values are used as placeholders if the
database does not exist.

Specifically the X-Backend-Status-Changed-At, X-Backend-DELETE-Timestamp and
the X-Backend-Storage-Policy-Index value will be needed by the reconciler to
deal with reconciling out of order object writes in the face of recently
deleted containers.

 * Add "status_changed_at" key to the response from ContainerBroker.get_info.
 * Add "Status Timestamp" field to swift.cli.info.print_db_info_metadata.
 * Add "status_changed_at" key to the response from AccountBroker.get_info.

DocImpact
Implements: blueprint storage-policies
Change-Id: Ie6d388f067f5b096b0f96faef151120ba23c8748
2014-06-18 17:31:38 -07:00
Alex Gaynor
032f0bfc7c Fix several typos in the codebase.
These were found using https://github.com/intgr/topy

Change-Id: I0dc7b76c44b8b17b1dcd79184dad1516fb11173c
2014-04-25 20:14:09 -07:00
Samuel Merritt
b5b0b78fc7 Remove obsolete future imports
The with statement has been standard since Python 2.5, so we can get
rid of these imports.

Change-Id: I280971c3d8c01e94cc2c17cacaedcbe9d9c8a3c3
2013-11-22 12:23:58 -08:00
Peter Portante
9411a24ba7 Revert "Refactor common/utils methods to common/ondisk"
This reverts commit 7760f41c3ce436cb23b4b8425db3749a3da33d32

Change-Id: I95e57a2563784a8cd5e995cc826afeac0eadbe62
Signed-off-by: Peter Portante <peter.portante@redhat.com>
2013-10-07 17:18:09 -04:00
ZhiQiang Fan
f72704fc82 Change OpenStack LLC to Foundation
Change-Id: I7c3df47c31759dbeb3105f8883e2688ada848d58
Closes-bug: #1214176
2013-09-20 01:02:31 +08:00
Peter Portante
7760f41c3c Refactor common/utils methods to common/ondisk
Place all the methods related to on-disk layout and / or configuration
into a new common module that can be shared by the various modules
using the same on-disk layout.

Change-Id: I27ffd4665d5115ffdde649c48a4d18e12017e6a9
Signed-off-by: Peter Portante <peter.portante@redhat.com>
2013-09-17 17:32:04 -04:00
Pete Zaitcev
d4b024ad7d Split backends off swift/common/db.py
The main purpose of this patch is to lay the groundwork for allowing
the container and account servers to optionally use pluggable backend
implementations. The backend.py files will eventually be the module
where the backend APIs are defined via docstrings of this reference
implementation. The swift/common/db.py module will remain an internal
module used by the reference implementation.

We have a raft of changes to docstrings staged for later, but this
patch takes care to relocate ContainerBroker and AccountBroker into
their new home intact.

Change-Id: Ibab5c7605860ab768c8aa5a3161a705705689b04
2013-09-10 13:30:28 -06:00