63f8c2284a
The purpose of GateKeeper mostly relates to the development of new swift code, so I threw together a guide for development_middleware that covers some basics with a eye towards metadata handling in-particular. I also fixed up some missing autodoc's, split out middleware autodoc and added some ref's here and about so I could link to them from the development_middleware guide. DocImpact Change-Id: I20dd942ea8df9e33c3e794cb49669ffa1332c63e
220 lines
10 KiB
ReStructuredText
220 lines
10 KiB
ReStructuredText
.. _large-objects:
|
|
|
|
====================
|
|
Large Object Support
|
|
====================
|
|
|
|
--------
|
|
Overview
|
|
--------
|
|
|
|
Swift has a limit on the size of a single uploaded object; by default this is
|
|
5GB. However, the download size of a single object is virtually unlimited with
|
|
the concept of segmentation. Segments of the larger object are uploaded and a
|
|
special manifest file is created that, when downloaded, sends all the segments
|
|
concatenated as a single object. This also offers much greater upload speed
|
|
with the possibility of parallel uploads of the segments.
|
|
|
|
---------------------
|
|
Dynamic Large Objects
|
|
---------------------
|
|
|
|
---------------
|
|
Using ``swift``
|
|
---------------
|
|
|
|
The quickest way to try out this feature is use the ``swift`` Swift Tool
|
|
included with the `python-swiftclient`_ library. You can use the ``-S``
|
|
option to specify the segment size to use when splitting a large file. For
|
|
example::
|
|
|
|
swift upload test_container -S 1073741824 large_file
|
|
|
|
This would split the large_file into 1G segments and begin uploading those
|
|
segments in parallel. Once all the segments have been uploaded, ``swift`` will
|
|
then create the manifest file so the segments can be downloaded as one.
|
|
|
|
So now, the following ``swift`` command would download the entire large object::
|
|
|
|
swift download test_container large_file
|
|
|
|
``swift`` uses a strict convention for its segmented object
|
|
support. In the above example it will upload all the segments into a
|
|
second container named test_container_segments. These segments will
|
|
have names like large_file/1290206778.25/21474836480/00000000,
|
|
large_file/1290206778.25/21474836480/00000001, etc.
|
|
|
|
The main benefit for using a separate container is that the main container
|
|
listings will not be polluted with all the segment names. The reason for using
|
|
the segment name format of <name>/<timestamp>/<size>/<segment> is so that an
|
|
upload of a new file with the same name won't overwrite the contents of the
|
|
first until the last moment when the manifest file is updated.
|
|
|
|
``swift`` will manage these segment files for you, deleting old segments on
|
|
deletes and overwrites, etc. You can override this behavior with the
|
|
``--leave-segments`` option if desired; this is useful if you want to have
|
|
multiple versions of the same large object available.
|
|
|
|
.. _`python-swiftclient`: http://github.com/openstack/python-swiftclient
|
|
|
|
----------
|
|
Direct API
|
|
----------
|
|
|
|
You can also work with the segments and manifests directly with HTTP
|
|
requests instead of having ``swift`` do that for you. You can just
|
|
upload the segments like you would any other object and the manifest
|
|
is just a zero-byte file with an extra ``X-Object-Manifest`` header.
|
|
|
|
All the object segments need to be in the same container, have a common object
|
|
name prefix, and their names sort in the order they should be concatenated.
|
|
They don't have to be in the same container as the manifest file will be, which
|
|
is useful to keep container listings clean as explained above with ``swift``.
|
|
|
|
The manifest file is simply a zero-byte file with the extra
|
|
``X-Object-Manifest: <container>/<prefix>`` header, where ``<container>`` is
|
|
the container the object segments are in and ``<prefix>`` is the common prefix
|
|
for all the segments.
|
|
|
|
It is best to upload all the segments first and then create or update the
|
|
manifest. In this way, the full object won't be available for downloading until
|
|
the upload is complete. Also, you can upload a new set of segments to a second
|
|
location and then update the manifest to point to this new location. During the
|
|
upload of the new segments, the original manifest will still be available to
|
|
download the first set of segments.
|
|
|
|
Here's an example using ``curl`` with tiny 1-byte segments::
|
|
|
|
# First, upload the segments
|
|
curl -X PUT -H 'X-Auth-Token: <token>' \
|
|
http://<storage_url>/container/myobject/1 --data-binary '1'
|
|
curl -X PUT -H 'X-Auth-Token: <token>' \
|
|
http://<storage_url>/container/myobject/2 --data-binary '2'
|
|
curl -X PUT -H 'X-Auth-Token: <token>' \
|
|
http://<storage_url>/container/myobject/3 --data-binary '3'
|
|
|
|
# Next, create the manifest file
|
|
curl -X PUT -H 'X-Auth-Token: <token>' \
|
|
-H 'X-Object-Manifest: container/myobject/' \
|
|
http://<storage_url>/container/myobject --data-binary ''
|
|
|
|
# And now we can download the segments as a single object
|
|
curl -H 'X-Auth-Token: <token>' \
|
|
http://<storage_url>/container/myobject
|
|
|
|
--------------------
|
|
Static Large Objects
|
|
--------------------
|
|
|
|
----------
|
|
Direct API
|
|
----------
|
|
|
|
SLO support centers around the user generated manifest file. After the user
|
|
has uploaded the segments into their account a manifest file needs to be
|
|
built and uploaded. All object segments, except the last, must be above 1 MB
|
|
(by default) in size. Please see the SLO docs for :ref:`slo-doc` further
|
|
details.
|
|
|
|
----------------
|
|
Additional Notes
|
|
----------------
|
|
|
|
* With a ``GET`` or ``HEAD`` of a manifest file, the ``X-Object-Manifest:
|
|
<container>/<prefix>`` header will be returned with the concatenated object
|
|
so you can tell where it's getting its segments from.
|
|
|
|
* The response's ``Content-Length`` for a ``GET`` or ``HEAD`` on the manifest
|
|
file will be the sum of all the segments in the ``<container>/<prefix>``
|
|
listing, dynamically. So, uploading additional segments after the manifest is
|
|
created will cause the concatenated object to be that much larger; there's no
|
|
need to recreate the manifest file.
|
|
|
|
* The response's ``Content-Type`` for a ``GET`` or ``HEAD`` on the manifest
|
|
will be the same as the ``Content-Type`` set during the ``PUT`` request that
|
|
created the manifest. You can easily change the ``Content-Type`` by reissuing
|
|
the ``PUT``.
|
|
|
|
* The response's ``ETag`` for a ``GET`` or ``HEAD`` on the manifest file will
|
|
be the MD5 sum of the concatenated string of ETags for each of the segments
|
|
in the manifest (for DLO, from the listing ``<container>/<prefix>``).
|
|
Usually in Swift the ETag is the MD5 sum of the contents of the object, and
|
|
that holds true for each segment independently. But it's not meaningful to
|
|
generate such an ETag for the manifest itself so this method was chosen to
|
|
at least offer change detection.
|
|
|
|
|
|
.. note::
|
|
|
|
If you are using the container sync feature you will need to ensure both
|
|
your manifest file and your segment files are synced if they happen to be
|
|
in different containers.
|
|
|
|
-------
|
|
History
|
|
-------
|
|
|
|
Dynamic large object support has gone through various iterations before
|
|
settling on this implementation.
|
|
|
|
The primary factor driving the limitation of object size in swift is
|
|
maintaining balance among the partitions of the ring. To maintain an even
|
|
dispersion of disk usage throughout the cluster the obvious storage pattern
|
|
was to simply split larger objects into smaller segments, which could then be
|
|
glued together during a read.
|
|
|
|
Before the introduction of large object support some applications were already
|
|
splitting their uploads into segments and re-assembling them on the client
|
|
side after retrieving the individual pieces. This design allowed the client
|
|
to support backup and archiving of large data sets, but was also frequently
|
|
employed to improve performance or reduce errors due to network interruption.
|
|
The major disadvantage of this method is that knowledge of the original
|
|
partitioning scheme is required to properly reassemble the object, which is
|
|
not practical for some use cases, such as CDN origination.
|
|
|
|
In order to eliminate any barrier to entry for clients wanting to store
|
|
objects larger than 5GB, initially we also prototyped fully transparent
|
|
support for large object uploads. A fully transparent implementation would
|
|
support a larger max size by automatically splitting objects into segments
|
|
during upload within the proxy without any changes to the client API. All
|
|
segments were completely hidden from the client API.
|
|
|
|
This solution introduced a number of challenging failure conditions into the
|
|
cluster, wouldn't provide the client with any option to do parallel uploads,
|
|
and had no basis for a resume feature. The transparent implementation was
|
|
deemed just too complex for the benefit.
|
|
|
|
The current "user manifest" design was chosen in order to provide a
|
|
transparent download of large objects to the client and still provide the
|
|
uploading client a clean API to support segmented uploads.
|
|
|
|
To meet an many use cases as possible swift supports two types of large
|
|
object manifests. Dynamic and static large object manifests both support
|
|
the same idea of allowing the user to upload many segments to be later
|
|
downloaded as a single file.
|
|
|
|
Dynamic large objects rely on a container lising to provide the manifest.
|
|
This has the advantage of allowing the user to add/removes segments from the
|
|
manifest at any time. It has the disadvantage of relying on eventually
|
|
consistent container listings. All three copies of the container dbs must
|
|
be updated for a complete list to be guaranteed. Also, all segments must
|
|
be in a single container, which can limit concurrent upload speed.
|
|
|
|
Static large objects rely on a user provided manifest file. A user can
|
|
upload objects into multiple containers and then reference those objects
|
|
(segments) in a self generated manifest file. Future GETs to that file will
|
|
download the concatenation of the specified segments. This has the advantage of
|
|
being able to immediately download the complete object once the manifest has
|
|
been successfully PUT. Being able to upload segments into separate containers
|
|
also improves concurrent upload speed. It has the disadvantage that the
|
|
manifest is finalized once PUT. Any changes to it means it has to be replaced.
|
|
|
|
Between these two methods the user has great flexibility in how (s)he chooses
|
|
to upload and retrieve large objects to swift. Swift does not, however, stop
|
|
the user from harming themselves. In both cases the segments are deletable by
|
|
the user at any time. If a segment was deleted by mistake, a dynamic large
|
|
object, having no way of knowing it was ever there, would happily ignore the
|
|
deleted file and the user will get an incomplete file. A static large object
|
|
would, when failing to retrieve the object specified in the manifest, drop the
|
|
connection and the user would receive partial results.
|