arguments are not locale decoded into Unicode
When the openstackclient in Python2 passes command line arguments to a subcommand it fails to pass the arguments as text (e.g. Unicode). Instead it passes the arguments as binary data encoded using the current locales encoding. An easy way to see this is trying to pass a username with a non-ASCII character. % openstack user delete ñew No user with a name or ID of 'ñew' exists. What occurs internally is when the user data is retrieved it's it properly represented in a Unicode object. However the username pased from the command line is still a str object encoded in the locales encoding (typically UTF-8). A string comparison is attempted between the encoded data from the command line and the Unicode text found in the user representation. This seldom ends well, either the comparison fails to match or a codec error is raised. There is a hard and fast rule, all text data must be stored in Unicode objects and the conversion from binary encoded text to Unicode must occur as close to the I/O boundary as possible. Python3 enforces this behavior automatically but in Python2 it is the programmers job to do so. In the past there have been attempts to fix problems deep inside internal code by attempting to decode from UTF-8. There are two problems with this approach. First, internal code has no way to accurately know what encoding was used to encode the binary data. This is way it needs to be decoded as close to the I/O source as possible because that is the best place to know the actual encoding. Guessing UTF-8 is at best a heuristic. Second, there must be a canonical representation for data "inside" the program, you don't want dozens of individual modules, classes, methods, etc. performing conversions, instead they should be able to make the assumption in what format text is represented in, the format for text data must be Unicode. This is another reason to decode as close to the I/O as possible. In Python3 the argv strings are decoded from the locales encoding by the interpreter. By the time any Python3 code sees the argv strings they will be Unicode. However in Python2 there must be explicit code added to decode the argv strings into Unicode. The conversion of sys.argv into Unicode only occurs when argv is not passed to OpenStackShell.run(). If a caller of OpenStackShell.run() supplies their own arg it is their responsiblity to assure they are passing actual text objects. Consider this a requirement of the API. Note: This patch does not contain a unittest to exercise the behavior because it is difficult to construct a test that depends on command invocation from a shell. The general structure of the unit tests is to pass fake argv into OpenStackShell.run() as if it came from a shell. Because the new code only operates when argv is not passed and defaults to sys.argv it conflicts with the unittest design. Change-Id: I779d260744728eae8455ff9dedb6e5c09c165559 Closes-Bug: 1603494 Signed-off-by: John Dennis <jdennis@redhat.com>
This commit is contained in:
parent
7a667d700f
commit
756d2fac67
@ -18,7 +18,9 @@
|
||||
|
||||
import argparse
|
||||
import getpass
|
||||
import locale
|
||||
import logging
|
||||
import six
|
||||
import sys
|
||||
import traceback
|
||||
|
||||
@ -474,8 +476,17 @@ class OpenStackShell(app.App):
|
||||
tcmd.run(targs)
|
||||
|
||||
|
||||
def main(argv=sys.argv[1:]):
|
||||
def main(argv=None):
|
||||
if argv is None:
|
||||
argv = sys.argv[1:]
|
||||
if six.PY2:
|
||||
# Emulate Py3, decode argv into Unicode based on locale so that
|
||||
# commands always see arguments as text instead of binary data
|
||||
encoding = locale.getpreferredencoding()
|
||||
if encoding:
|
||||
argv = map(lambda arg: arg.decode(encoding), argv)
|
||||
|
||||
return OpenStackShell().run(argv)
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main(sys.argv[1:]))
|
||||
sys.exit(main())
|
||||
|
Loading…
x
Reference in New Issue
Block a user