openstack-manuals/doc/security-guide/ch052_devices.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE chapter [
<!ENTITY % openstack SYSTEM "../common/entities/openstack.ent">
%openstack;
]>
<chapter xmlns:xi="http://www.w3.org/2001/XInclude"
         xmlns:xlink="http://www.w3.org/1999/xlink"
         xmlns="http://docbook.org/ns/docbook"
         version="5.0"
         xml:id="ch052_devices">
  <?dbhtml stop-chunking?>
    <title>Hardening the virtualization layers</title>
    <para>
      In the beginning of this chapter we discuss the use of both
      physical and virtual hardware by instances, the associated
      security risks, and some recommendations for mitigating those
      risks. We conclude the chapter with a discussion of sVirt, an
      open source project for integrating SELinux mandatory access
      controls with the virtualization components.</para>
    <section xml:id="ch052_devices-idp479920">
      <title>Physical hardware (PCI passthrough)</title>
      <para>
        Many hypervisors offer a functionality known as PCI
        passthrough. This allows an instance to have direct access to
        a piece of hardware on the node. For example, this could be
        used to allow instances to access video cards offering the
        compute unified device architecture (CUDA) for high
        performance computation. This feature carries two types of
        security risks: direct memory access and hardware
        infection.</para>
      <para>
        Direct memory access (DMA) is a feature that permits certain
        hardware devices to access arbitrary physical memory addresses
        in the host computer. Often video cards have this
        capability. However, an instance should not be given arbitrary
        physical memory access because this would give it full view of
        both the host system and other instances running on the same
        node. Hardware vendors use an input/output memory management
        unit (IOMMU) to manage DMA access in these
        situations. Therefore, cloud architects should ensure that the
        hypervisor is configured to utilize this hardware
        feature.</para>
      <itemizedlist>
        <listitem>
          <para>KVM: <link
            xlink:href="http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM">How
            to assign devices with VT-d in KVM</link></para>
        </listitem>
        <listitem>
          <para>Xen: <link xlink:href="http://wiki.xen.org/wiki/VTd_HowTo">VTd Howto</link>
          </para>
        </listitem>
      </itemizedlist>
      <note>
        <para>
          The IOMMU feature is marketed as VT-d by Intel and AMD-Vi by
          AMD.</para>
      </note>
      <para>
        A hardware infection occurs when an instance makes a malicious
        modification to the firmware or some other part of a
        device. As this device is used by other instances, or even the
        host OS, the malicious code can spread into these systems. The
        end result is that one instance can run code outside of its
        security domain. This is a potential problem in any hardware
        sharing scenario. The problem is specific to this scenario
        because it is harder to reset the state of physical hardware
        than virtual hardware.</para>
      <para>
        Solutions to the hardware infection problem are domain
        specific. The strategy is to identify how an instance can
        modify hardware state then determine how to reset any
        modifications when the instance is done using the
        hardware. For example, one option could be to re-flash the
        firmware after use. Clearly there is a need to balance
        hardware longevity with security as some firmwares will fail
        after a large number of writes. TPM technology, described in
        <xref linkend="ch013_node-bootstrapping-idp44768"/>, provides
        a solution for detecting unauthorized firmware
        changes. Regardless of the strategy selected, it is important
        to understand the risks associated with this kind of hardware
        sharing so that they can be properly mitigated for a given
        deployment scenario.
      </para>
      <para>
        Additionally, due to the risk and complexities associated with
        PCI passthrough, it should be disabled by default. If enabled
        for a specific need, you will need to have appropriate
        processes in place to ensure the hardware is clean before
        re-issue.</para>
    </section>
    <section xml:id="ch052_devices-idp488320">
      <title>Virtual hardware (QEMU)</title>
      <para>
        When running a virtual machine, virtual hardware is a software
        layer that provides the hardware interface for the virtual
        machine. Instances use this functionality to provide network,
        storage, video, and other devices that may be needed.  With
        this in mind, most instances in your environment will
        exclusively use virtual hardware, with a minority that will
        require direct hardware access. The major open source
        hypervisors use QEMU for this functionality. While QEMU fills
        an important need for virtualization platforms, it has proven
        to be a very challenging software project to write and
        maintain. Much of the functionality in QEMU is implemented
        with low-level code that is difficult for most developers to
        comprehend.  Furthermore, the hardware virtualized by QEMU
        includes many legacy devices that have their own set of
        quirks. Putting all of this together, QEMU has been the source
        of many security problems, including hypervisor breakout
        attacks.</para>
      <para>
        For the reasons stated above, it is important to take
        proactive steps to harden QEMU. We recommend three specific
        steps: minimizing the code base, using compiler hardening, and
        using mandatory access controls, such as sVirt, SELinux, or
        AppArmor.</para>
      <section xml:id="ch052_devices-idp490976">
        <title>Minimizing the QEMU code base</title>
        <para>
          One classic security principle is to remove any unused
          components from your system. QEMU provides support for many
          different virtual hardware devices. However, only a small
          number of devices are needed for a given instance. Most
          instances will use the virtio devices. However, some legacy
          instances will need access to specific hardware, which can
          be specified using glance metadata:</para>
        <screen><prompt>$</prompt> <userinput>glance image-update \
    --property hw_disk_bus=ide \
    --property hw_cdrom_bus=ide \
    --property hw_vif_model=e1000 \
    f16-x86_64-openstack-sda</userinput></screen>
        <para>
          A cloud architect should decide what devices to make
          available to cloud users. Anything that is not needed should
          be removed from QEMU. This step requires recompiling QEMU
          after modifying the options passed to the QEMU configure
          script. For a complete list of up-to-date options simply run
          <command>./configure --help</command> from within the QEMU
          source directory. Decide what is needed for your deployment,
          and disable the remaining options.</para>
      </section>
      <section xml:id="ch052_devices-idp494336">
        <title>Compiler hardening</title>
        <para>
          The next step is to harden QEMU using compiler hardening
          options. Modern compilers provide a variety of compile time
          options to improve the security of the resulting
          binaries. These features, which we will describe in more
          detail below, include relocation read-only (RELRO), stack
          canaries, never execute (NX), position independent
          executable (PIE), and address space layout randomization
          (ASLR).</para>
        <para>
          Many modern linux distributions already build QEMU with
          compiler hardening enabled, so you may want to verify your
          existing executable before proceeding with the information
          below. One tool that can assist you with this verification
          is called <link
          xlink:href="http://www.trapkit.de/tools/checksec.html"><literal>checksec.sh</literal></link>.</para>
        <variablelist>
          <varlistentry>
            <term>RELocation Read-Only (RELRO)</term>
            <listitem>
              <para>
                Hardens the data sections of an executable. Both full
                and partial RELRO modes are supported by gcc. For QEMU
                full RELRO is your best choice. This will make the
                global offset table read-only and place various
                internal data sections before the program data section
                in the resulting executable.</para>
            </listitem>
          </varlistentry>
          <varlistentry>
            <term>Stack canaries</term>
            <listitem>
              <para>
                Places values on the stack and verifies their presence
                to help prevent buffer overflow attacks.</para>
            </listitem>
          </varlistentry>
          <varlistentry>
            <term>Never eXecute (NX)</term>
            <listitem>
              <para>
                Also known as Data Execution Prevention (DEP), ensures
                that data sections of the executable can not be
                executed.</para>
            </listitem>
          </varlistentry>
          <varlistentry>
            <term>Position Independent Executable (PIE)</term>
            <listitem>
              <para>
                Produces a position independent executable, which is
                necessary for ASLR.</para>
            </listitem>
          </varlistentry>
          <varlistentry>
            <term>Address Space Layout Randomization (ASLR)</term>
          <listitem>
            <para>
              This ensures that placement of both code and data
              regions will be randomized. Enabled by the kernel (all
              modern linux kernels support ASLR), when the executable
              is built with PIE.</para>
          </listitem>
          </varlistentry>
        </variablelist>
        <para>
          Putting this all together, and adding in some additional
          useful protections, we recommend the following compiler
          options for GCC when compiling QEMU:</para>
        <programlisting>CFLAGS="-arch x86_64 -fstack-protector-all -Wstack-protector \
 --param ssp-buffer-size=4 -pie -fPIE -ftrapv -D_FORTIFY_SOURCE=2 -O2 \
 -Wl,-z,relro,-z,now"</programlisting>
        <para>
          We recommend testing your QEMU executable file after it is
          compiled to ensure that the compiler hardening worked
          properly.</para>
        <para>
          Most cloud deployments will not want to build software such
          as QEMU by hand. It is better to use packaging to ensure
          that the process is repeatable and to ensure that the end
          result can be easily deployed throughout the cloud. The
          references below provide some additional details on applying
          compiler hardening options to existing packages.</para>
        <itemizedlist>
          <listitem>
            <para>DEB packages: <link xlink:href="http://wiki.debian.org/HardeningWalkthrough">Hardening Walkthrough</link></para>
          </listitem>
          <listitem>
            <para>RPM packages: <link xlink:href="http://fedoraproject.org/wiki/How_to_create_an_RPM_package">How to create an RPM package</link></para>
          </listitem>
        </itemizedlist>
      </section>
      <section xml:id="ch052_devices-idp508032">
        <title>Mandatory access controls</title>
        <para>
          Compiler hardening makes it more difficult to attack the
          QEMU process. However, if an attacker does succeed, we would
          like to limit the impact of the attack. Mandatory access
          controls accomplish this by restricting the privileges on
          QEMU process to only what is needed. This can be
          accomplished using sVirt / SELinux or AppArmor. When using
          sVirt, SELinux is configured to run every QEMU process under
          a different security context. AppArmor can be configured to
          provide similar functionality. We provide more details on
          sVirt in the instance isolation section below.</para>
      </section>
    </section>
    <section xml:id="ch052_devices-idp510512">
      <title>sVirt: SELinux and virtualization</title>
      <para>
        With unique kernel-level architecture and National Security
        Agency (NSA) developed security mechanisms, KVM provides
        foundational isolation technologies for multi tenancy.  With
        developmental origins dating back to 2002, the Secure
        Virtualization (sVirt) technology is the application of
        SELinux against modern day virtualization. SELinux, which was
        designed to apply separation control based upon labels, has
        been extended to provide isolation between virtual machine
        processes, devices, data files and system processes acting
        upon their behalf.</para>
      <para>
        OpenStack's sVirt implementation aspires to protect hypervisor
        hosts and virtual machines against two primary threat
        vectors:</para>
      <itemizedlist><listitem>
          <para><emphasis role="bold">Hypervisor threats</emphasis> A
            compromised application running within a virtual machine
            attacks the hypervisor to access underlying resources. For
            example, the host OS, applications, or devices within the
            physical machine. This is a threat vector unique to
            virtualization and represents considerable risk as the
            underlying real machine can be compromised due to
            vulnerability in a single virtual application.</para>
        </listitem>
        <listitem>
          <para><emphasis role="bold">Virtual Machine (multi-tenant)
            threats</emphasis> A compromised application running
            within a VM attacks the hypervisor to access/control
            another virtual machine and its resources. This is a
            threat vector unique to virtualization and represents
            considerable risk as a multitude of virtual machine file
            images could be compromised due to vulnerability in a
            single application.  This virtual network attack is a
            major concern as the administrative techniques for
            protecting real networks do not directly apply to the
            virtual environment.</para>
        </listitem>
      </itemizedlist>
      <para>
        Each KVM-based virtual machine is a process which is labeled
        by SELinux, effectively establishing a security boundary
        around each virtual machine. This security boundary is
        monitored and enforced by the Linux kernel, restricting the
        virtual machine's access to resources outside of its boundary
        such as host machine data files or other VMs.</para>
      <para>
        <inlinemediaobject>
          <imageobject role="html">
            <imagedata contentdepth="583" contentwidth="1135"
                       fileref="static/sVirt Diagram 1.png" format="PNG"
                       scalefit="1"/>
          </imageobject>
          <imageobject role="fo">
            <imagedata contentdepth="100%" fileref="static/sVirt Diagram 1.png"
                       format="PNG" scalefit="1" width="100%"/>
          </imageobject>
        </inlinemediaobject>
      </para>
      <para>
        As shown above, sVirt isolation is provided regardless of the
        guest Operating System running inside the virtual
        machine&mdash;Linux or Windows VMs can be used. Additionally,
        many Linux distributions provide SELinux within the operating
        system, allowing the virtual machine to protect internal
        virtual resources from threats.
      </para>
      <section xml:id="ch052_devices-idp523744">
        <title>Labels and categories</title>
        <para>
          KVM-based virtual machine instances are labelled with their
          own SELinux data type, known as svirt_image_t. Kernel level
          protections prevent unauthorized system processes, such as
          malware, from manipulating the virtual machine image files
          on disk. When virtual machines are powered off, images are
          stored as svirt_image_t as shown below:</para>
        <programlisting>system_u:object_r:svirt_image_t:SystemLow image1
system_u:object_r:svirt_image_t:SystemLow image2
system_u:object_r:svirt_image_t:SystemLow image3
system_u:object_r:svirt_image_t:SystemLow image4</programlisting>
        <para>
          The <literal>svirt_image_t</literal> label uniquely
          identifies image files on disk, allowing for the SELinux
          policy to restrict access. When a KVM-based Compute image is
          powered on, sVirt appends a random numerical identifier to
          the image. sVirt is technically capable of assigning
          numerical identifiers to 524,288 virtual machines per
          hypervisor node, however OpenStack deployments are highly
          unlikely to encounter this limitation.</para>
        <para>This example shows the sVirt category identifier:</para>
        <programlisting>system_u:object_r:svirt_image_t:s0:c87,c520 image1
system_u:object_r:svirt_image_t:s0:419,c172 image2</programlisting>
      </section>
      <section xml:id="ch052_devices-idp527632">
        <title>Booleans</title>
        <para>
          To ease the administrative burden of managing SELinux, many
          enterprise Linux platforms utilize SELinux Booleans to
          quickly change the security posture of sVirt.</para>
        <para>
          Red Hat Enterprise Linux-based KVM deployments utilize the
          following sVirt booleans:</para>

          <informaltable rules="all" width="80%"><colgroup><col/><col/></colgroup>

            <thead>
              <tr>
                <td><para><emphasis role="bold">sVirt SELinux Boolean</emphasis></para></td>
                <td><para><emphasis role="bold">Description</emphasis></para></td>
              </tr>
            </thead>
            <tbody>
              <tr>
                <td><para>virt_use_common</para></td>
                <td><para>Allow virt to use serial/parallel communication ports.</para></td>
              </tr>
              <tr>
                <td><para>virt_use_fusefs</para></td>
                <td><para>Allow virt to read FUSE mounted files.</para></td>
              </tr>
              <tr>
                <td><para>virt_use_nfs</para></td>
                <td><para>Allow virt to manage NFS mounted files.</para></td>
              </tr>
              <tr>
                <td><para>virt_use_samba</para></td>
                <td><para>Allow virt to manage CIFS mounted files.</para></td>
              </tr>
              <tr>
                <td><para>virt_use_sanlock</para></td>
                <td><para>Allow confined virtual guests to interact with the sanlock.</para></td>
              </tr>
              <tr>
                <td><para>virt_use_sysfs</para></td>
                <td><para>Allow virt to manage device configuration (PCI).</para></td>
              </tr>
              <tr>
                <td><para>virt_use_usb</para></td>
                <td><para>Allow virt to use USB devices.</para></td>
              </tr>
              <tr>
                <td><para>virt_use_xserver</para></td>
                <td><para>Allow virtual machine to interact with the X Window System.</para></td>
              </tr>
            </tbody>
          </informaltable>

      </section>
    </section>
  </chapter>