aceec15371
Better follow conventions, especially: * remove Latinism like via and i.e. * use variable lists * Add missing <filename> * wrap long lines Change-Id: I2a537df78ddf4fbeb127b058bf05caaf42441d5f
404 lines
20 KiB
XML
404 lines
20 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE chapter [
|
|
<!ENTITY % openstack SYSTEM "../common/entities/openstack.ent">
|
|
%openstack;
|
|
]>
|
|
<chapter xmlns:xi="http://www.w3.org/2001/XInclude"
|
|
xmlns:xlink="http://www.w3.org/1999/xlink"
|
|
xmlns="http://docbook.org/ns/docbook"
|
|
version="5.0"
|
|
xml:id="ch052_devices">
|
|
<?dbhtml stop-chunking?>
|
|
<title>Hardening the virtualization layers</title>
|
|
<para>
|
|
In the beginning of this chapter we discuss the use of both
|
|
physical and virtual hardware by instances, the associated
|
|
security risks, and some recommendations for mitigating those
|
|
risks. We conclude the chapter with a discussion of sVirt, an
|
|
open source project for integrating SELinux mandatory access
|
|
controls with the virtualization components.</para>
|
|
<section xml:id="ch052_devices-idp479920">
|
|
<title>Physical hardware (PCI passthrough)</title>
|
|
<para>
|
|
Many hypervisors offer a functionality known as PCI
|
|
passthrough. This allows an instance to have direct access to
|
|
a piece of hardware on the node. For example, this could be
|
|
used to allow instances to access video cards offering the
|
|
compute unified device architecture (CUDA) for high
|
|
performance computation. This feature carries two types of
|
|
security risks: direct memory access and hardware
|
|
infection.</para>
|
|
<para>
|
|
Direct memory access (DMA) is a feature that permits certain
|
|
hardware devices to access arbitrary physical memory addresses
|
|
in the host computer. Often video cards have this
|
|
capability. However, an instance should not be given arbitrary
|
|
physical memory access because this would give it full view of
|
|
both the host system and other instances running on the same
|
|
node. Hardware vendors use an input/output memory management
|
|
unit (IOMMU) to manage DMA access in these
|
|
situations. Therefore, cloud architects should ensure that the
|
|
hypervisor is configured to utilize this hardware
|
|
feature.</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>KVM: <link
|
|
xlink:href="http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM">How
|
|
to assign devices with VT-d in KVM</link></para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Xen: <link xlink:href="http://wiki.xen.org/wiki/VTd_HowTo">VTd Howto</link>
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<note>
|
|
<para>
|
|
The IOMMU feature is marketed as VT-d by Intel and AMD-Vi by
|
|
AMD.</para>
|
|
</note>
|
|
<para>
|
|
A hardware infection occurs when an instance makes a malicious
|
|
modification to the firmware or some other part of a
|
|
device. As this device is used by other instances, or even the
|
|
host OS, the malicious code can spread into these systems. The
|
|
end result is that one instance can run code outside of its
|
|
security domain. This is a potential problem in any hardware
|
|
sharing scenario. The problem is specific to this scenario
|
|
because it is harder to reset the state of physical hardware
|
|
than virtual hardware.</para>
|
|
<para>
|
|
Solutions to the hardware infection problem are domain
|
|
specific. The strategy is to identify how an instance can
|
|
modify hardware state then determine how to reset any
|
|
modifications when the instance is done using the
|
|
hardware. For example, one option could be to re-flash the
|
|
firmware after use. Clearly there is a need to balance
|
|
hardware longevity with security as some firmwares will fail
|
|
after a large number of writes. TPM technology, described in
|
|
<xref linkend="ch013_node-bootstrapping-idp44768"/>, provides
|
|
a solution for detecting unauthorized firmware
|
|
changes. Regardless of the strategy selected, it is important
|
|
to understand the risks associated with this kind of hardware
|
|
sharing so that they can be properly mitigated for a given
|
|
deployment scenario.
|
|
</para>
|
|
<para>
|
|
Additionally, due to the risk and complexities associated with
|
|
PCI passthrough, it should be disabled by default. If enabled
|
|
for a specific need, you will need to have appropriate
|
|
processes in place to ensure the hardware is clean before
|
|
re-issue.</para>
|
|
</section>
|
|
<section xml:id="ch052_devices-idp488320">
|
|
<title>Virtual hardware (QEMU)</title>
|
|
<para>
|
|
When running a virtual machine, virtual hardware is a software
|
|
layer that provides the hardware interface for the virtual
|
|
machine. Instances use this functionality to provide network,
|
|
storage, video, and other devices that may be needed. With
|
|
this in mind, most instances in your environment will
|
|
exclusively use virtual hardware, with a minority that will
|
|
require direct hardware access. The major open source
|
|
hypervisors use QEMU for this functionality. While QEMU fills
|
|
an important need for virtualization platforms, it has proven
|
|
to be a very challenging software project to write and
|
|
maintain. Much of the functionality in QEMU is implemented
|
|
with low-level code that is difficult for most developers to
|
|
comprehend. Furthermore, the hardware virtualized by QEMU
|
|
includes many legacy devices that have their own set of
|
|
quirks. Putting all of this together, QEMU has been the source
|
|
of many security problems, including hypervisor breakout
|
|
attacks.</para>
|
|
<para>
|
|
For the reasons stated above, it is important to take
|
|
proactive steps to harden QEMU. We recommend three specific
|
|
steps: minimizing the code base, using compiler hardening, and
|
|
using mandatory access controls, such as sVirt, SELinux, or
|
|
AppArmor.</para>
|
|
<section xml:id="ch052_devices-idp490976">
|
|
<title>Minimizing the QEMU code base</title>
|
|
<para>
|
|
One classic security principle is to remove any unused
|
|
components from your system. QEMU provides support for many
|
|
different virtual hardware devices. However, only a small
|
|
number of devices are needed for a given instance. Most
|
|
instances will use the virtio devices. However, some legacy
|
|
instances will need access to specific hardware, which can
|
|
be specified using glance metadata:</para>
|
|
<screen><prompt>$</prompt> <userinput>glance image-update \
|
|
--property hw_disk_bus=ide \
|
|
--property hw_cdrom_bus=ide \
|
|
--property hw_vif_model=e1000 \
|
|
f16-x86_64-openstack-sda</userinput></screen>
|
|
<para>
|
|
A cloud architect should decide what devices to make
|
|
available to cloud users. Anything that is not needed should
|
|
be removed from QEMU. This step requires recompiling QEMU
|
|
after modifying the options passed to the QEMU configure
|
|
script. For a complete list of up-to-date options simply run
|
|
<command>./configure --help</command> from within the QEMU
|
|
source directory. Decide what is needed for your deployment,
|
|
and disable the remaining options.</para>
|
|
</section>
|
|
<section xml:id="ch052_devices-idp494336">
|
|
<title>Compiler hardening</title>
|
|
<para>
|
|
The next step is to harden QEMU using compiler hardening
|
|
options. Modern compilers provide a variety of compile time
|
|
options to improve the security of the resulting
|
|
binaries. These features, which we will describe in more
|
|
detail below, include relocation read-only (RELRO), stack
|
|
canaries, never execute (NX), position independent
|
|
executable (PIE), and address space layout randomization
|
|
(ASLR).</para>
|
|
<para>
|
|
Many modern linux distributions already build QEMU with
|
|
compiler hardening enabled, so you may want to verify your
|
|
existing executable before proceeding with the information
|
|
below. One tool that can assist you with this verification
|
|
is called <link
|
|
xlink:href="http://www.trapkit.de/tools/checksec.html"><literal>checksec.sh</literal></link>.</para>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>RELocation Read-Only (RELRO)</term>
|
|
<listitem>
|
|
<para>
|
|
Hardens the data sections of an executable. Both full
|
|
and partial RELRO modes are supported by gcc. For QEMU
|
|
full RELRO is your best choice. This will make the
|
|
global offset table read-only and place various
|
|
internal data sections before the program data section
|
|
in the resulting executable.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>Stack canaries</term>
|
|
<listitem>
|
|
<para>
|
|
Places values on the stack and verifies their presence
|
|
to help prevent buffer overflow attacks.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>Never eXecute (NX)</term>
|
|
<listitem>
|
|
<para>
|
|
Also known as Data Execution Prevention (DEP), ensures
|
|
that data sections of the executable can not be
|
|
executed.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>Position Independent Executable (PIE)</term>
|
|
<listitem>
|
|
<para>
|
|
Produces a position independent executable, which is
|
|
necessary for ASLR.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>Address Space Layout Randomization (ASLR)</term>
|
|
<listitem>
|
|
<para>
|
|
This ensures that placement of both code and data
|
|
regions will be randomized. Enabled by the kernel (all
|
|
modern linux kernels support ASLR), when the executable
|
|
is built with PIE.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
<para>
|
|
Putting this all together, and adding in some additional
|
|
useful protections, we recommend the following compiler
|
|
options for GCC when compiling QEMU:</para>
|
|
<programlisting>CFLAGS="-arch x86_64 -fstack-protector-all -Wstack-protector \
|
|
--param ssp-buffer-size=4 -pie -fPIE -ftrapv -D_FORTIFY_SOURCE=2 -O2 \
|
|
-Wl,-z,relro,-z,now"</programlisting>
|
|
<para>
|
|
We recommend testing your QEMU executable file after it is
|
|
compiled to ensure that the compiler hardening worked
|
|
properly.</para>
|
|
<para>
|
|
Most cloud deployments will not want to build software such
|
|
as QEMU by hand. It is better to use packaging to ensure
|
|
that the process is repeatable and to ensure that the end
|
|
result can be easily deployed throughout the cloud. The
|
|
references below provide some additional details on applying
|
|
compiler hardening options to existing packages.</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>DEB packages: <link xlink:href="http://wiki.debian.org/HardeningWalkthrough">Hardening Walkthrough</link></para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>RPM packages: <link xlink:href="http://fedoraproject.org/wiki/How_to_create_an_RPM_package">How to create an RPM package</link></para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</section>
|
|
<section xml:id="ch052_devices-idp508032">
|
|
<title>Mandatory access controls</title>
|
|
<para>
|
|
Compiler hardening makes it more difficult to attack the
|
|
QEMU process. However, if an attacker does succeed, we would
|
|
like to limit the impact of the attack. Mandatory access
|
|
controls accomplish this by restricting the privileges on
|
|
QEMU process to only what is needed. This can be
|
|
accomplished using sVirt / SELinux or AppArmor. When using
|
|
sVirt, SELinux is configured to run every QEMU process under
|
|
a different security context. AppArmor can be configured to
|
|
provide similar functionality. We provide more details on
|
|
sVirt in the instance isolation section below.</para>
|
|
</section>
|
|
</section>
|
|
<section xml:id="ch052_devices-idp510512">
|
|
<title>sVirt: SELinux and virtualization</title>
|
|
<para>
|
|
With unique kernel-level architecture and National Security
|
|
Agency (NSA) developed security mechanisms, KVM provides
|
|
foundational isolation technologies for multi tenancy. With
|
|
developmental origins dating back to 2002, the Secure
|
|
Virtualization (sVirt) technology is the application of
|
|
SELinux against modern day virtualization. SELinux, which was
|
|
designed to apply separation control based upon labels, has
|
|
been extended to provide isolation between virtual machine
|
|
processes, devices, data files and system processes acting
|
|
upon their behalf.</para>
|
|
<para>
|
|
OpenStack's sVirt implementation aspires to protect hypervisor
|
|
hosts and virtual machines against two primary threat
|
|
vectors:</para>
|
|
<itemizedlist><listitem>
|
|
<para><emphasis role="bold">Hypervisor threats</emphasis> A
|
|
compromised application running within a virtual machine
|
|
attacks the hypervisor to access underlying resources. For
|
|
example, the host OS, applications, or devices within the
|
|
physical machine. This is a threat vector unique to
|
|
virtualization and represents considerable risk as the
|
|
underlying real machine can be compromised due to
|
|
vulnerability in a single virtual application.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para><emphasis role="bold">Virtual Machine (multi-tenant)
|
|
threats</emphasis> A compromised application running
|
|
within a VM attacks the hypervisor to access/control
|
|
another virtual machine and its resources. This is a
|
|
threat vector unique to virtualization and represents
|
|
considerable risk as a multitude of virtual machine file
|
|
images could be compromised due to vulnerability in a
|
|
single application. This virtual network attack is a
|
|
major concern as the administrative techniques for
|
|
protecting real networks do not directly apply to the
|
|
virtual environment.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<para>
|
|
Each KVM-based virtual machine is a process which is labeled
|
|
by SELinux, effectively establishing a security boundary
|
|
around each virtual machine. This security boundary is
|
|
monitored and enforced by the Linux kernel, restricting the
|
|
virtual machine's access to resources outside of its boundary
|
|
such as host machine data files or other VMs.</para>
|
|
<para>
|
|
<inlinemediaobject>
|
|
<imageobject role="html">
|
|
<imagedata contentdepth="583" contentwidth="1135"
|
|
fileref="static/sVirt Diagram 1.png" format="PNG"
|
|
scalefit="1"/>
|
|
</imageobject>
|
|
<imageobject role="fo">
|
|
<imagedata contentdepth="100%" fileref="static/sVirt Diagram 1.png"
|
|
format="PNG" scalefit="1" width="100%"/>
|
|
</imageobject>
|
|
</inlinemediaobject>
|
|
</para>
|
|
<para>
|
|
As shown above, sVirt isolation is provided regardless of the
|
|
guest Operating System running inside the virtual
|
|
machine—Linux or Windows VMs can be used. Additionally,
|
|
many Linux distributions provide SELinux within the operating
|
|
system, allowing the virtual machine to protect internal
|
|
virtual resources from threats.
|
|
</para>
|
|
<section xml:id="ch052_devices-idp523744">
|
|
<title>Labels and categories</title>
|
|
<para>
|
|
KVM-based virtual machine instances are labelled with their
|
|
own SELinux data type, known as svirt_image_t. Kernel level
|
|
protections prevent unauthorized system processes, such as
|
|
malware, from manipulating the virtual machine image files
|
|
on disk. When virtual machines are powered off, images are
|
|
stored as svirt_image_t as shown below:</para>
|
|
<programlisting>system_u:object_r:svirt_image_t:SystemLow image1
|
|
system_u:object_r:svirt_image_t:SystemLow image2
|
|
system_u:object_r:svirt_image_t:SystemLow image3
|
|
system_u:object_r:svirt_image_t:SystemLow image4</programlisting>
|
|
<para>
|
|
The <literal>svirt_image_t</literal> label uniquely
|
|
identifies image files on disk, allowing for the SELinux
|
|
policy to restrict access. When a KVM-based Compute image is
|
|
powered on, sVirt appends a random numerical identifier to
|
|
the image. sVirt is technically capable of assigning
|
|
numerical identifiers to 524,288 virtual machines per
|
|
hypervisor node, however OpenStack deployments are highly
|
|
unlikely to encounter this limitation.</para>
|
|
<para>This example shows the sVirt category identifier:</para>
|
|
<programlisting>system_u:object_r:svirt_image_t:s0:c87,c520 image1
|
|
system_u:object_r:svirt_image_t:s0:419,c172 image2</programlisting>
|
|
</section>
|
|
<section xml:id="ch052_devices-idp527632">
|
|
<title>Booleans</title>
|
|
<para>
|
|
To ease the administrative burden of managing SELinux, many
|
|
enterprise Linux platforms utilize SELinux Booleans to
|
|
quickly change the security posture of sVirt.</para>
|
|
<para>
|
|
Red Hat Enterprise Linux-based KVM deployments utilize the
|
|
following sVirt booleans:</para>
|
|
|
|
<informaltable rules="all" width="80%"><colgroup><col/><col/></colgroup>
|
|
|
|
<thead>
|
|
<tr>
|
|
<td><para><emphasis role="bold">sVirt SELinux Boolean</emphasis></para></td>
|
|
<td><para><emphasis role="bold">Description</emphasis></para></td>
|
|
</tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td><para>virt_use_common</para></td>
|
|
<td><para>Allow virt to use serial/parallel communication ports.</para></td>
|
|
</tr>
|
|
<tr>
|
|
<td><para>virt_use_fusefs</para></td>
|
|
<td><para>Allow virt to read FUSE mounted files.</para></td>
|
|
</tr>
|
|
<tr>
|
|
<td><para>virt_use_nfs</para></td>
|
|
<td><para>Allow virt to manage NFS mounted files.</para></td>
|
|
</tr>
|
|
<tr>
|
|
<td><para>virt_use_samba</para></td>
|
|
<td><para>Allow virt to manage CIFS mounted files.</para></td>
|
|
</tr>
|
|
<tr>
|
|
<td><para>virt_use_sanlock</para></td>
|
|
<td><para>Allow confined virtual guests to interact with the sanlock.</para></td>
|
|
</tr>
|
|
<tr>
|
|
<td><para>virt_use_sysfs</para></td>
|
|
<td><para>Allow virt to manage device configuration (PCI).</para></td>
|
|
</tr>
|
|
<tr>
|
|
<td><para>virt_use_usb</para></td>
|
|
<td><para>Allow virt to use USB devices.</para></td>
|
|
</tr>
|
|
<tr>
|
|
<td><para>virt_use_xserver</para></td>
|
|
<td><para>Allow virtual machine to interact with the X Window System.</para></td>
|
|
</tr>
|
|
</tbody>
|
|
</informaltable>
|
|
|
|
</section>
|
|
</section>
|
|
</chapter>
|