86eadd3864
not't corrected changed to not sqlite to SQLite Change-Id: Id9e29de7e872377bc29401a3bc0a5e3958a09690
220 lines
13 KiB
XML
220 lines
13 KiB
XML
<?xml version="1.0" encoding="utf-8"?>
|
||
<chapter xmlns="http://docbook.org/ns/docbook"
|
||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||
version="5.0"
|
||
xml:id="module003-ch006-more-concepts">
|
||
<title>A Bit More On Swift</title>
|
||
<para><guilabel>Containers and Objects</guilabel></para>
|
||
<para>A container is a storage compartment for your data and
|
||
provides a way for you to organize your data. You can
|
||
think of a container as a folder in Windows or a
|
||
directory in UNIX. The primary difference between a
|
||
container and these other file system concepts is that
|
||
containers cannot be nested. You can, however, create an
|
||
unlimited number of containers within your account. Data
|
||
must be stored in a container so you must have at least
|
||
one container defined in your account prior to uploading
|
||
data.</para>
|
||
<para>The only restrictions on container names is that they
|
||
cannot contain a forward slash (/) or an ascii null (%00)
|
||
and must be less than 257 bytes in length. Please note
|
||
that the length restriction applies to the name after it
|
||
has been URL encoded. For example, a container name of
|
||
Course Docs would be URL encoded as Course%20Docs and
|
||
therefore be 13 bytes in length rather than the expected
|
||
11.</para>
|
||
<para>An object is the basic storage entity and any optional
|
||
metadata that represents the files you store in the
|
||
OpenStack Object Storage system. When you upload data to
|
||
OpenStack Object Storage, the data is stored as-is (no
|
||
compression or encryption) and consists of a location
|
||
(container), the object's name, and any metadata
|
||
consisting of key/value pairs. For instance, you may chose
|
||
to store a backup of your digital photos and organize them
|
||
into albums. In this case, each object could be tagged
|
||
with metadata such as Album : Caribbean Cruise or Album :
|
||
Aspen Ski Trip.</para>
|
||
<para>The only restriction on object names is that they must
|
||
be less than 1024 bytes in length after URL encoding. For
|
||
example, an object name of C++final(v2).txt should be URL
|
||
encoded as C%2B%2Bfinal%28v2%29.txt and therefore be 24
|
||
bytes in length rather than the expected 16.</para>
|
||
<para>The maximum allowable size for a storage object upon
|
||
upload is 5 gigabytes (GB) and the minimum is zero bytes.
|
||
You can use the built-in large object support and the
|
||
swift utility to retrieve objects larger than 5 GB.</para>
|
||
<para>For metadata, you should not exceed 90 individual
|
||
key/value pairs for any one object and the total byte
|
||
length of all key/value pairs should not exceed 4KB (4096
|
||
bytes).</para>
|
||
<para><guilabel>Language-Specific API
|
||
Bindings</guilabel></para>
|
||
<para>A set of supported API bindings in several popular
|
||
languages are available from the Rackspace Cloud Files
|
||
product, which uses OpenStack Object Storage code for its
|
||
implementation. These bindings provide a layer of
|
||
abstraction on top of the base REST API, allowing
|
||
programmers to work with a container and object model
|
||
instead of working directly with HTTP requests and
|
||
responses. These bindings are free (as in beer and as in
|
||
speech) to download, use, and modify. They are all
|
||
licensed under the MIT License as described in the COPYING
|
||
file packaged with each binding. If you do make any
|
||
improvements to an API, you are encouraged (but not
|
||
required) to submit those changes back to us.</para>
|
||
<para>The API bindings for Rackspace Cloud Files are hosted
|
||
at<link xlink:href="http://github.com/rackspace"
|
||
></link><link
|
||
xlink:href="http://github.com/rackspace"
|
||
>http://github.com/rackspace</link>. Feel free to
|
||
coordinate your changes through github or, if you prefer,
|
||
send your changes to cloudfiles@rackspacecloud.com. Just
|
||
make sure to indicate which language and version you
|
||
modified and send a unified diff.</para>
|
||
<para>Each binding includes its own documentation (either
|
||
HTML, PDF, or CHM). They also include code snippets and
|
||
examples to help you get started. The currently supported
|
||
API binding for OpenStack Object Storage are:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>PHP (requires 5.x and the modules: cURL,
|
||
FileInfo, mbstring)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Python (requires 2.4 or newer)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Java (requires JRE v1.5 or newer)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>C#/.NET (requires .NET Framework v3.5)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Ruby (requires 1.8 or newer and mime-tools
|
||
module)</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>There are no other supported language-specific bindings
|
||
at this time. You are welcome to create your own language
|
||
API bindings and we can help answer any questions during
|
||
development, host your code if you like, and give you full
|
||
credit for your work.</para>
|
||
<para><guilabel>Proxy Server</guilabel></para>
|
||
<para>The Proxy Server is responsible for tying together
|
||
the rest of the OpenStack Object Storage architecture.
|
||
For each request, it will look up the location of the
|
||
account, container, or object in the ring (see below)
|
||
and route the request accordingly. The public API is
|
||
also exposed through the Proxy Server.</para>
|
||
<para>A large number of failures are also handled in the
|
||
Proxy Server. For example, if a server is unavailable
|
||
for an object PUT, it will ask the ring for a hand-off
|
||
server and route there instead.</para>
|
||
<para>When objects are streamed to or from an object
|
||
server, they are streamed directly through the proxy
|
||
server to or from the user – the proxy server does not
|
||
spool them.</para>
|
||
<para>You can use a proxy server with account management
|
||
enabled by configuring it in the proxy server
|
||
configuration file.</para>
|
||
<para><guilabel>Object Server</guilabel></para>
|
||
<para>The Object Server is a very simple blob storage
|
||
server that can store, retrieve and delete objects
|
||
stored on local devices. Objects are stored as binary
|
||
files on the filesystem with metadata stored in the
|
||
file’s extended attributes (xattrs). This requires
|
||
that the underlying filesystem choice for object
|
||
servers support xattrs on files. Some filesystems,
|
||
like ext3, have xattrs turned off by default.</para>
|
||
<para>Each object is stored using a path derived from the
|
||
object name’s hash and the operation’s timestamp. Last
|
||
write always wins, and ensures that the latest object
|
||
version will be served. A deletion is also treated as
|
||
a version of the file (a 0 byte file ending with
|
||
“.ts”, which stands for tombstone). This ensures that
|
||
deleted files are replicated correctly and older
|
||
versions don’t magically reappear due to failure
|
||
scenarios.</para>
|
||
<para><guilabel>Container Server</guilabel></para>
|
||
<para>The Container Server’s primary job is to handle
|
||
listings of objects. It does not know where those
|
||
objects are, just what objects are in a specific
|
||
container. The listings are stored as SQLite database
|
||
files, and replicated across the cluster similar to
|
||
how objects are. Statistics are also tracked that
|
||
include the total number of objects, and total storage
|
||
usage for that container.</para>
|
||
<para><guilabel>Account Server</guilabel></para>
|
||
<para>The Account Server is very similar to the Container
|
||
Server, excepting that it is responsible for listings
|
||
of containers rather than objects.</para>
|
||
<para><guilabel>Replication</guilabel></para>
|
||
<para>Replication is designed to keep the system in a
|
||
consistent state in the face of temporary error
|
||
conditions like network outages or drive
|
||
failures.</para>
|
||
<para>The replication processes compare local data with
|
||
each remote copy to ensure they all contain the latest
|
||
version. Object replication uses a hash list to
|
||
quickly compare subsections of each partition, and
|
||
container and account replication use a combination of
|
||
hashes and shared high water marks.</para>
|
||
<para>Replication updates are push based. For object
|
||
replication, updating is just a matter of rsyncing
|
||
files to the peer. Account and container replication
|
||
push missing records over HTTP or rsync whole database
|
||
files.</para>
|
||
<para>The replicator also ensures that data is removed
|
||
from the system. When an item (object, container, or
|
||
account) is deleted, a tombstone is set as the latest
|
||
version of the item. The replicator will see the
|
||
tombstone and ensure that the item is removed from the
|
||
entire system.</para>
|
||
<para>To separate the cluster-internal replication traffic
|
||
from client traffic, separate replication servers can
|
||
be used. These replication servers are based on the
|
||
standard storage servers, but they listen on the
|
||
replication IP and only respond to REPLICATE requests.
|
||
Storage servers can serve REPLICATE requests, so an
|
||
operator can transition to using a separate
|
||
replication network with no cluster downtime.</para>
|
||
<para>Replication IP and port information is stored in the
|
||
ring on a per-node basis. These parameters will be
|
||
used if they are present, but they are not required.
|
||
If this information does not exist or is empty for a
|
||
particular node, the node's standard IP and port will
|
||
be used for replication.</para>
|
||
<para><guilabel>Updaters</guilabel></para>
|
||
<para>There are times when container or account data can
|
||
not be immediately updated. This usually occurs during
|
||
failure scenarios or periods of high load. If an
|
||
update fails, the update is queued locally on the file
|
||
system, and the updater will process the failed
|
||
updates. This is where an eventual consistency window
|
||
will most likely come in to play. For example, suppose
|
||
a container server is under load and a new object is
|
||
put in to the system. The object will be immediately
|
||
available for reads as soon as the proxy server
|
||
responds to the client with success. However, the
|
||
container server did not update the object listing,
|
||
and so the update would be queued for a later update.
|
||
Container listings, therefore, may not immediately
|
||
contain the object.</para>
|
||
<para>In practice, the consistency window is only as large
|
||
as the frequency at which the updater runs and may not
|
||
even be noticed as the proxy server will route listing
|
||
requests to the first container server which responds.
|
||
The server under load may not be the one that serves
|
||
subsequent listing requests – one of the other two
|
||
replicas may handle the listing.</para>
|
||
<para><guilabel>Auditors</guilabel></para>
|
||
<para>Auditors crawl the local server checking the
|
||
integrity of the objects, containers, and accounts. If
|
||
corruption is found (in the case of bit rot, for
|
||
example), the file is quarantined, and replication
|
||
will replace the bad file from another replica. If
|
||
other errors are found they are logged. For example,
|
||
an object’s listing cannot be found on any container
|
||
server it should be.</para>
|
||
</chapter> |