https://tools.ietf.org/html/rfc7232#section-3 defines the form for
If-Match and If-None-Match as
If-Match = "*" / 1#entity-tag
If-None-Match = "*" / 1#entity-tag
https://tools.ietf.org/html/rfc7230#section-7 in turn defines the
1#<type> syntax as
1#element => element *( OWS "," OWS element )
where OWS is *optional* whitespace. Our swob.Match object should respect
that optionality.
Change-Id: I6ee1c6674e0e9c156149319022fd289504bd3722
In some unit tests instead of self.fail(msg) statements
self.assertTrue(False, msg) were used, which might be ambiguous.
Using assertTrue(False, msg) gives the following message on fail:
File "C:\Python361\lib\unittest\case.py", line 678, in assertTrue
raise self.failureException(msg)
AssertionError: False is not true : msg
'False is not true' message implies that unit test failed (as the
result is False while we asserted True).
Replaced with self.fail(msg) is less ambiguous and more readable.
File "C:\Python361\lib\unittest\case.py", line 666, in fail
raise self.failureException(msg)
AssertionError: msg
TrivialFix
Change-Id: Ib56a0ed8549fd7af2724eb59222106888781e9c8
Following OpenStack Style Guidelines:
[1] http://docs.openstack.org/developer/hacking/#unit-tests-and-assertraises
[H203] Unit test assertions tend to give better messages for more specific
assertions. As a result, assertIsNone(...) is preferred over
assertEqual(None, ...) and assertIs(..., None)
Change-Id: If4db8872c4f5705c1fff017c4891626e9ce4d1e4
If a user sends a Range header with no satisfiable ranges, we send back
a 416 Requested Range Not Satisfiable response. Previously however,
there would be no indication of the size of the object they were
requesting, so they wouldn't know how to craft a satisfiable range. We
*do* send a Content-Length, but it is (correctly) the length of the
error message.
The RFC [1] has an answer for this:
> A server generating a 416 (Range Not Satisfiable) response to a
> byte-range request SHOULD send a Content-Range header field with an
> unsatisfied-range value, as in the following example:
>
> Content-Range: bytes */1234
>
> The complete-length in a 416 response indicates the current length of
> the selected representation.
Now, we'll send a Content-Range header for all 416 responses, including
those coming from the object server as well as those generated on a
proxy because of the Range mangling required to support EC policies.
[1] RFC 7233, section 4.2, although similar language was used in RFC
2616, sections 10.4.17 and 14.16
Change-Id: I80c7390fc6f84a10a212b0641bb07a64dfccbd45
This is consistent with what we already do for other
semantically-invalid values like "bytes=--1". Previously, we would
return a 416 Requested Range Not Satisfiable response like we do for the
semantically-valid-but-not-really-meaningful "bytes=-0"
Change-Id: I932b42406c9a5ee7eaa6978a655e61022a957415
There are a few issues going on in the
controllers/base.py:_get_response_parts_iter(). One is that the
"raise" statement that attempts to re-raise the GeneratorExit, may
re-raise ValueError if that was the last exception caught.
Secondly, the range may not actually be set in the backend_headers
(need to investigate further, as that could actually be faulty tests,
since learn_size_from_content_range should always set it). The patch
changes the Range construction to throw a ValueError if None is passed
in that case.
Lastly, the range may be only half-defined, e.g. bytes=0-. In that
case, the check of how many bytes are expected vs how many bytes have
been sent does not make sense.
Change-Id: Ida5adf3d33c736240b2c4bae5510b5289f03dee2
Bonus consistency: 416 responses now always have a body. Before, if
you had "swob.HTTPRequestedRangeNotSatisfiable()", you'd get a body,
but if you had "swob.Response(..., conditional_response=True)", then
you'd get a length-0 response body. Now you always get a response
body. It's just the default <html><h1>..., but at it's always there.
Bonus efficiency: do a little caching of sub-SLO manifests to avoid
needless re-fetches. This kicks in when there are multiple references
to the same sub-SLO in a given manifest. The caching only holds 20
sub-SLOs so that a malicious user can't build a giant SLO tree and use
it to run the proxy out of memory (we're already holding up to 10
manifests in memory at a time since a SLO can include another SLO to a
depth of 10; this doesn't make the situation too much worse).
Change-Id: I24716e3271cf3370642e3755447e717fd7d9957c
Rewrite server side copy and 'object post as copy' feature as middleware to
simplify the PUT method in the object controller code. COPY is no longer
a verb implemented as public method in Proxy application.
The server side copy middleware is inserted to the left of dlo, slo and
versioned_writes middlewares in the proxy server pipeline. As a result,
dlo and slo copy_hooks are no longer required. SLO manifests are now
validated when copied so when copying a manifest to another account the
referenced segments must be readable in that account for the manifest
copy to succeed (previously this validation was not made, meaning the
manifest was copied but could be unusable if the segments were not
readable).
With this change, there should be no change in functionality or existing
behavior. This is asserted with (almost) no changes required to existing
functional tests.
Some notes (for operators):
* Middleware required to be auto-inserted before slo and dlo and
versioned_writes
* Turning off server side copy is not configurable.
* object_post_as_copy is no longer a configurable option of proxy server
but of this middleware. However, for smooth upgrade, config option set
in proxy server app is also read.
DocImpact: Introducing server side copy as middleware
Co-Authored-By: Alistair Coles <alistair.coles@hpe.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Change-Id: Ic96a92e938589a2f6add35a40741fd062f1c29eb
Signed-off-by: Prashanth Pai <ppai@redhat.com>
Signed-off-by: Thiago da Silva <thiago@redhat.com>
If the client asked for "Range: bytes=--123", Swift would respond with
a 206 and a Content-Length of -123. Now that Range header is ignored
just like all kinds of other invalid Range headers.
Change-Id: I30d4522d223076ce342d20c52f57ff0eb2aea1f4
Closes-Bug: 1571106
There was a function in swift.common.utils that was importing
swob.HeaderKeyDict at call time. It couldn't import it at compilation
time since utils can't import from swob or else it blows up with a
circular import error.
This commit just moves HeaderKeyDict into swift.common.header_key_dict
so that we can remove the inline import.
Change-Id: I656fde8cc2e125327c26c589cf1045cb81ffc7e5
Proxy-server now requires Content-Length in the response header
when getting object and does not support chunked transferring with
"Transfer-Encoding: chunked"
This doesn't matter in normal swift, but prohibits us from putting
any middelwares to execute something like streaming processing of
objects, which can't calculate the length of their response body
before they start to send their response.
Change-Id: I60fc6c86338d734e39b7e5f1e48a2647995045ef
The content-type header inserted into a multipart message part
is missing any params such as charset because its value is being
fetched via the swob.Response content_type property, which conforms
to webob spec and strips off all params.
This was noticed in work on feature/crypto branch because the crypto
meta param was being stripped off content-type in multipart messages,
preventing the content-type being decrypted. But in general there is
no suggestion in the multipart message spec [1] that params should
not be included. In fact examples in [1] show the charset param
included in the content-type value.
To ensure that the multipart message part content-type includes the
original content-type params, fetch it directly from the response
headers.
[1] http://www.w3.org/Protocols/rfc1341/7_2_Multipart.html
Change-Id: Iff7274aa631a92cd7332212ed8b4378c27da4a1f
assertEquals is deprecated in py3, replacing it.
Change-Id: Ida206abbb13c320095bb9e3b25a2b66cc31bfba8
Co-Authored-By: Ondřej Nový <ondrej.novy@firma.seznam.cz>
The urllib, urllib2 and urlparse modules of Python 2 were reorganized
into a new urllib namespace on Python 3. Replace urllib, urllib2 and
urlparse imports with six.moves.urllib to make the modified code
compatible with Python 2 and Python 3.
The initial patch was generated by the urllib operation of the sixer
tool on: bin/* swift/ test/.
Change-Id: I61a8c7fb7972eabc7da8dad3b3d34bceee5c5d93
* HeaderEnvironProxy: replace UserDict.DictMixin with
collections.MutableMapping, add __iter__ and __len__ methods, and
add more unit tests
* Replace url* imports with six.moves.urllib
Change-Id: I9ed22d0dd52ee7ac8fa16571f82c45975cfdffff
The assert_() method is deprecated and can be safely replaced by assertTrue().
This patch makes sure that running the tests does not create undesired
warnings.
Change-Id: I0602ba39ef93263386644ee68088d5f65fcb4a71
wsgi.input is a binary stream (bytes), not a text stream (unicode).
* Replace StringIO with BytesIO for WSGI input
* Replace StringIO('') with StringIO() and replace WsgiStringIO('') with
WsgiStringIO(): an empty string is already the default value
Change-Id: I09c9527be2265a6847189aeeb74a17261ddc781a
* replace "from cStringIO import StringIO"
with "from six.moves import cStringIO as StringIO"
* replace "from StringIO import StringIO"
with "from six import StringIO"
* replace "import cStringIO" and "cStringIO.StringIO()"
with "from six import moves" and "moves.cStringIO()"
* replace "import StringIO" and "StringIO.StringIO()"
with "import six" and "six.StringIO()"
This patch was generated by the stringio operation of the sixer tool:
https://pypi.python.org/pypi/sixer
Change-Id: Iacba77fec3045f96773d1090c0bd48613729a561
The Python 2 next() method of iterators was renamed to __next__() on
Python 3. Use the builtin next() function instead which works on Python
2 and Python 3.
Change-Id: Ic948bc574b58f1d28c5c58e3985906dee17fa51d
This commit lets clients receive multipart/byteranges responses (see
RFC 7233, Appendix A) for erasure-coded objects. Clients can already
do this for replicated objects, so this brings EC closer to feature
parity (ha!).
GetOrHeadHandler got a base class extracted from it that treats an
HTTP response as a sequence of byte-range responses. This way, it can
continue to yield whole fragments, not just N-byte pieces of the raw
HTTP response, since an N-byte piece of a multipart/byteranges
response is pretty much useless.
There are a couple of bonus fixes in here, too. For starters, download
resuming now works on multipart/byteranges responses. Before, it only
worked on 200 responses or 206 responses for a single byte
range. Also, BufferedHTTPResponse grew a readline() method.
Also, the MIME response for replicated objects got tightened up a
little. Before, it had some leading and trailing CRLFs which, while
allowed by RFC 7233, provide no benefit. Now, both replicated and EC
multipart/byteranges avoid extraneous bytes. This let me re-use the
Content-Length calculation in swob instead of having to either hack
around it or add extraneous whitespace to match.
Change-Id: I16fc65e0ec4e356706d327bdb02a3741e36330a0
This lets the proxy server send object metadata to the object server
after the object data. This is necessary for EC, as it allows us to
compute the etag of the object in the proxy server and still store it
with the object.
The wire format is a multipart MIME document. For sanity during a
rolling upgrade, the multipart MIME document is only sent to the
object server if it indicates, via 100 Continue header, that it knows
how to consume it.
Example 1 (new proxy, new obj server):
proxy: PUT /p/a/c/o
X-Backend-Obj-Metadata-Footer: yes
obj: 100 Continue
X-Obj-Metadata-Footer: yes
proxy: --MIMEmimeMIMEmime...
Example2: (new proxy, old obj server)
proxy: PUT /p/a/c/o
X-Backend-Obj-Metadata-Footer: yes
obj: 100 Continue
proxy: <obj body>
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: John Dickinson <me@not.mn>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Paul Luse <paul.e.luse@intel.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
Change-Id: Id38f7e93e3473f19ff88123ae0501000ed9b2e89
This patch fixes swob.Response to set missing content
length correctly.
When a child class of swob.Response is initialized with
both "body" and "headers" arguments which includes content
length, swob.Response might loose the acutual content length
generated from the body because "headers" will overwrite the
content length property after the body assignment.
It'll cause the difference between headers's content length
and acutual body length. This would affect mainly 3rd party
middleware(s) to make an original response as follows:
req = swob.Request.blank('/')
req.method = 'HEAD'
resp = req.get_response(app)
return HTTPOk(body='Ok', headers=resp.headers)
This patch changes the order of headers updating and then
fixes init() to set correct content length.
Change-Id: Icd8b7cbfe6bbe2c7965175969af299a5eb7a74ef
RFC 7233 says that servers MAY reject egregious range-GET requests
such as requests with hundreds of ranges, requests with non-ascending
ranges, and so on.
Such requests are fairly hard for Swift to process. Consider a Range
header that asks for the first byte of every 10th MiB in a 4 GiB
object, but in some random order. That'll cause a lot of seeks on the
object server, but the corresponding response body is quite small in
comparison to the workload.
This commit makes Swift reject, with a 416 response, any ranged GET
request with more than fifty ranges, more than three overlapping
ranges, or more than eight non-increasing ranges.
This is a necessary prerequisite for supporting multi-range GETs on
large objects. Otherwise, a malicious user could construct a Range
header with hundreds of byte ranges where each individual byterange
requires the proxy to contact a different object server. If seeking
all over a disk is bad, connecting all over the cluster is way worse.
DocImpact
Change-Id: I4dcedcaae6c3deada06a0223479e611094d57234
The normalized form of the X-Timestamp header looks like a float with a fixed
width to ensure stable string sorting - normalized timestamps look like
"1402464677.04188"
To support overwrites of existing data without modifying the original
timestamp but still maintain consistency a second internal offset
vector is append to the normalized timestamp form which compares and
sorts greater than the fixed width float format but less than a newer
timestamp. The internalized format of timestamps looks like
"1402464677.04188_0000000000000000" - the portion after the underscore
is the offset and is a formatted hexadecimal integer.
The internalized form is not exposed to clients in responses from Swift.
Normal client operations will not create a timestamp with an offset.
The Timestamp class in common.utils supports internalized and normalized
formatting of timestamps and also comparison of timestamp values. When the
offset value of a Timestamp is 0 - it's considered insignificant and need not
be represented in the string format; to support backwards compatibility during
a Swift upgrade the internalized and normalized form of a Timestamp with an
insignificant offset are identical. When a timestamp includes an offset it
will always be represented in the internalized form, but is still excluded
from the normalized form. Timestamps with an equivalent timestamp portion
(the float part) will compare and order by their offset. Timestamps with a
greater timestamp portion will always compare and order greater than a
Timestamp with a lesser timestamp regardless of it's offset. String
comparison and ordering is guaranteed for the internalized string format, and
is backwards compatible for normalized timestamps which do not include an
offset.
The reconciler currently uses a offset bump to ensure that objects can move to
the wrong storage policy and be moved back. This use-case is valid because
the content represented by the user-facing timestamp is not modified in way.
Future consumers of the offset vector of timestamps should be mindful of HTTP
semantics of If-Modified and take care to avoid deviation in the response from
the object server without an accompanying change to the user facing timestamp.
DocImpact
Implements: blueprint storage-policies
Change-Id: Id85c960b126ec919a481dc62469bf172b7fb8549
HTTP header values should be quoted. Since the WWW-Authenticate
header value contains user-supplied strings, it's important to
ensure it's properly quoted to ensure the integrity of the protocol.
Previous to this patch, the URL was unquoted and then the unquoted
value was returned in the header. This patch re-quotes the value
when it is set on the response.
This is filed as CVS-2014-3497
Fixes bug 1327414
Change-Id: If8bd8842f2ce821756e9b4461a18a8ac8d42fb8c
We already supported it for object GET requests, but not for
HEAD. This lets clients keep metadata up-to-date without having to
either fetch the whole object when it's changed or do their own date
parsing. They can just treat Last-Modified as opaque and update their
idea of metadata when they get a 200.
Change-Id: Iff25d8989a93d651fd2c327e1e58036e79e1bde1
If I want to fetch an object only if it is newer than the first moon
landing, I send a GET request with header:
If-Modified-Since: Sun, 20 Jul 1969 20:18:00 UTC
Since that date is older than Swift, I expect a 2xx response. However,
I get a 412, which isn't even a valid thing to do for
If-Modified-Since; it should either be 2xx or 304. This is because of
two problems:
(a) Swift treats pre-1970 dates as invalid, and
(b) Swift returns 412 when a date is invalid instead of ignoring it.
This commit makes it so any time between datetime.datetime.min and
datetime.datetime.max is an acceptable value for If-Modified-Since and
If-Unmodified-Since. Dates outside that date range are treated as
invalid headers and thus are ignored, as RFC 2616 section 14.28
requires ("If the specified date is invalid, the header is ignored").
This only works for dates that the Python standard library can parse,
which on my machine is 01 Jan 1 to 31 Dec 9999. Eliminating those
restrictions would require implementing our own date parsing and
comparison, and that's almost certainly not worth it.
Change-Id: I4cb4903c4e5e3b6b3c9506c2cabbfbda62e82f35
I moved the checking of If-Match and If-None-Match out of the object
server's GET method and into swob so that everyone can use it. The
interface is similar to the Range handling; make a response with
conditional_response=True, and you get handing of If-Match and
If-None-Match.
Since the only users of conditional_response are object GET, object
HEAD, SLO, and DLO, this has the effect of adding support for If-Match
and If-None-Match to just the latter three places and nowhere
else. This makes object GET and HEAD consistent for any kind of
object, large or small.
This also fixes a bug where various conditional headers (If-*) were
passed through to the object server on segment requests, which could
cause segment requests to fail with a 304 or 412 response. Now only
certain headers are copied to the segment requests, and that doesn't
include the conditional ones, so they can't goof up the segment
retrieval.
Note that I moved SegmentedIterable to swift.common.request_helpers
because it sprouted a transitive dependency on swob, and leaving it in
utils caused a circular import.
Bonus fix: unified the handling of DiskFileQuarantined and
DiskFileNotFound in object server GET and HEAD. Now in either case, a
412 will be returned if the client said "If-Match: *". If not, the
response is a 404, just like before.
Closes-Bug: 1279076
Closes-Bug: 1280022
Closes-Bug: 1280028
Change-Id: Id2ee78346244d516b980202e990aa38ce6812de5
This is needed for SOS (along with patch
https://github.com/dpgoetz/sos/pull/37)
to work with swift 1.12 . By spec you should always use the absolute
location but this causes a problem with staticweb over a cdn using a
cname. Basically you want to be able to forward the browser to a
relative location instead of whatever full url the proxy server
thinks you are using.
Change-Id: I3fa1d415bf9b566be069458b838f7e65db0c4f39
The proxy server was calling swob.Request.path_info_pop() prior to
instantiating a controller so that req.path_info was just /a/c/o (sans
/v1). The version got moved over into SCRIPT_NAME.
This lead to some unfortunate behavior when trying to re-use a request
from middleware. Something like this:
# Imagine we're a WSGIContext object here.
#
# To start, SCRIPT_NAME = '' and PATH_INFO='/v1/a/c/o'
resp_iter = self._app_call(env, start_response)
# Now SCRIPT_NAME='/v1' and PATH_INFO ='/a/c/o'
if something_special in self._response_headers:
env['REQUEST_METHOD'] = 'GET'
env.pop('HTTP_RANGE', None)
# 404 SURPRISE! The proxy calls path_info_pop() again,
# and now SCRIPT_NAME='/v1/a' and PATH_INFO='/c/o', so this
# gets treated as a container request. Yikes.
resp_iter = self._app_call(env, start_response)
Now we just leave SCRIPT_NAME and PATH_INFO alone. To make life easy
for everyone who does want just /a/c/o, I defined
swob.Request.swift_entity_path, which just strips off the /v1.
Note that there's still one call to path_info_pop() in tempauth, but
that's only for requests going to /auth, so it won't affect Swift API
requests. It might be a good idea to remove that one later, but let's
do one thing at a time.
Change-Id: I87557a11c01f3f3889b610578cda6ba7d3933e7a
PEP 333 (WSGI) says that if your iterable has a close() method, the
framework must call it.
WSGIContext._app_call pulls the first chunk off the returned iterable
to make sure that it gets status and headers, and then it would
itertools.chain() that first chunk back onto the iterable so the whole
body went out. swob.Response.call_application() does it too.
The problem is that an itertools.chain object doesn't have a close()
method, so your iterable's fancy-pants close() method has no chance of
getting called.
This patch adds a slightly smarter CloseableChain that works like
itertools.chain, but has a close() method that calls the underlying
iterables' close() methods, if any.
Change-Id: If975c93f53c27dfa0c2f52f4bbf599af25202f70
Per http://www.ietf.org/rfc/rfc2616.txt, when a 401 error is returned, the
Www-Authenticate response header MUST also be returned. The format is
described in http://www.ietf.org/rfc/rfc2617.txt.
Swift supports and/or implements a number of authentication schemes
including tempauth, Keystone, tempurl, formpost and container sync. In
this fix, we use a catch-all, "Swift". The realm is the account (where
known) or "unknown" (bad path or where the 401 is returned from code
that does not have the request). Examples:
Www-Authenticate: Swift realm="AUTH_1234567889"
Www-Authenticate: Swift realm="unknown"
Fixes bug #1215491
Change-Id: I03362789318dfa156d3733ef9348795062a9cfc4
except x,y: was deprected and is removed in Python 3.x.
Use "except x as y:" instead which works in any Python
version >= 2.6.
Change-Id: I7008c74b807340f3457d3a0c8bd0b83f23169d14
Fixes a warning triggered by Hacking 0.7.x or newer. There
is no need to use a positional string formatting here, since
this is not going to be localized.
Change-Id: Ie38d620aecb0b48cd113af45cc9ca0d61f8f8ff1
I knew webob.Request.blank could take most of the attributes on the class as
kwargs to blank, so I went and looked how. It seems to work ok and is pretty
nice.
Change-Id: I72fae7c28f81c97768ee98b8ebcd69789a4c2e84
You'd think this would just work, given that HeaderKeyDict inherits
from dict and overrides the usual __thingy__ methods, but it
doesn't. It would work if you title-cased the key, but the whole point
of HeaderKeyDict is to do that for you.
Change-Id: If5c22df0690a245d1dd02fa3a52fa135235fe60d