Merge "Add some tuning documentation"
This commit is contained in:
commit
71ebba5cf3
@ -44,6 +44,8 @@ Requirements
|
||||
Using IPA requires it to be present and configured on the deploy ramdisk, see
|
||||
:ref:`deploy-ramdisk`
|
||||
|
||||
.. _ipa-proxies:
|
||||
|
||||
Using proxies for image download
|
||||
================================
|
||||
|
||||
|
@ -53,6 +53,7 @@ Advanced Topics
|
||||
Agent Token <agent-token>
|
||||
Deploying without BMC Credentials <agent-power>
|
||||
Layer 3 or DHCP-less Ramdisk Booting <dhcp-less>
|
||||
Tuning Ironic <tuning>
|
||||
|
||||
.. toctree::
|
||||
:hidden:
|
||||
|
141
doc/source/admin/tuning.rst
Normal file
141
doc/source/admin/tuning.rst
Normal file
@ -0,0 +1,141 @@
|
||||
=============
|
||||
Tuning Ironic
|
||||
=============
|
||||
|
||||
Memory Utilization
|
||||
==================
|
||||
|
||||
Memory utilization is a difficult thing to tune in Ironic as largely we may
|
||||
be asked by API consumers to perform work for which the underlying tools
|
||||
require large amounts of memory.
|
||||
|
||||
The biggest example of this is image conversion. Images not in a raw format
|
||||
need to be written out to disk (local files or remote in iscsi deploy) which
|
||||
requires the conversion process to generate an in-memory map to re-assemble
|
||||
the image contents into a coherent stream of data. This entire process also
|
||||
stresses the kernel buffers and cache.
|
||||
|
||||
This ultimately comes down to a trade-off of Memory versus Performance,
|
||||
similar to the trade-off of Performance versus Cost.
|
||||
|
||||
On a plus side, an idle Ironic deployment does not need much in the way
|
||||
of memory. On the down side, a highly bursty environment where a large
|
||||
number of concurrent deployments may be requested should consider two
|
||||
aspects:
|
||||
|
||||
* How is the ironic-api service/process set up? Will more
|
||||
processes be launched automatically?
|
||||
* Are images prioritized for storage size on disk? Or are they compressed and
|
||||
require format conversion?
|
||||
|
||||
API
|
||||
===
|
||||
|
||||
Ironic's API should have a fairly stable memory footprint with activity,
|
||||
however depending on how the webserver is running the API, additional
|
||||
processes can be launched.
|
||||
|
||||
Under normal conditions, as of Ironic 15.1, the ``ironic-api`` service/process
|
||||
consumes approximately 270MB of memory per worker. Depending on how the
|
||||
process is being launched, the number of workers and maximum request threads
|
||||
per worker may differ. Naturally there are configuration and performance
|
||||
trade-offs.
|
||||
|
||||
* Directly as a native python process, i.e. execute ``ironic-api``
|
||||
processes. Each single worker allows for multiple requests to be handled
|
||||
and threaded at the same time which can allow high levels of request
|
||||
concurrency. As of the Victoria cycle, a direct invocation of the
|
||||
``ironic-api`` program will only launch a maximum of four workers.
|
||||
* Launched via a wrapper such as Apache+uWSGI may allow for multiple distinct
|
||||
worker processes, but these workers typically limit the number of request
|
||||
processing threads that are permitted to execute. This means requests can
|
||||
stack up in the front-end webserver and be released to the ``ironic-api``
|
||||
as prior requests complete. In environments with long running synchronous
|
||||
calls, such as use of the vendor passthru interface, this can be very
|
||||
problematic.
|
||||
|
||||
When the webserver is launched by the API process directly, the default is
|
||||
based upon the number of CPU sockets in your machine.
|
||||
|
||||
When launching using uwsgi, this will entirely vary upon your configuration,
|
||||
but balancing workers/threads based upon your load and needs is highly
|
||||
advisable. Each worker process is unique and consumes far more memory than
|
||||
a comparable number of worker threads. At the same time, the scheduler will
|
||||
focus on worker processes as the threads are greenthreads.
|
||||
|
||||
.. note::
|
||||
Host operating systems featuring in-memory de-duplication should see
|
||||
an improvement in the overall memory footprint with multiple processes,
|
||||
but this is not something the development team has measured and will vary
|
||||
based upon multiple factors.
|
||||
|
||||
One important item to note: each Ironic API service/process *does* keep a
|
||||
copy of the hash ring as generated from the database *in-memory*. This is
|
||||
done to help allocate load across a cluster in-line with how individual nodes
|
||||
and their responsible conductors are allocated across the cluster.
|
||||
In other words, your amount of memory WILL increase corresponding to
|
||||
the number of nodes managed by each ironic conductor. It is important
|
||||
to understand that features such as `conductor groups <./conductor-groups.rst>`_
|
||||
means that only matching portions of nodes will be considered for the
|
||||
hash ring if needed.
|
||||
|
||||
Conductor
|
||||
=========
|
||||
|
||||
A conductor process will launch a number of other processes, as required,
|
||||
in order to complete the requested work. Ultimately this means it can quickly
|
||||
consume large amounts of memory because it was asked to complete a substantial
|
||||
amount of work all at once.
|
||||
|
||||
The ``ironic-conductor`` from ironic 15.1 consumes by default about 340MB of
|
||||
RAM in an idle configuration. This process, by default, operates as a single
|
||||
process. Additional processes can be launched, but they must have unique
|
||||
resolvable hostnames and addresses for JSON-RPC or use a central
|
||||
oslo.messaging supported message bus in order for Webserver API to Conductor
|
||||
API communication to be functional.
|
||||
|
||||
Typically, the most memory intensive operation that can be triggered is a
|
||||
image conversion for deployment, which is limited to 1GB of RAM per conversion
|
||||
process.
|
||||
|
||||
Most deployments, by default, do have a concurrency limit depending on their
|
||||
Compute (See `nova.conf <https://docs.openstack.org/nova/latest/configuration/sample-config.html>`_
|
||||
setting ``max_concurrent_builds``) configuration. However, this is only per
|
||||
``nova-compute`` worker, so naturally this concurrency will scale with
|
||||
additional workers.
|
||||
|
||||
Stand-alone users can easily request deployments exceeding the Compute service
|
||||
default maximum concurrent builds. As such, if your environment is used this
|
||||
way, you may wish to carefully consider your deployment architecture.
|
||||
|
||||
With a single nova-compute process talking to a single conductor, asked to
|
||||
perform ten concurrent deployments of images requiring conversion, the memory
|
||||
needed may exceed 10GB. This does however, entirely depend upon image block
|
||||
structure and layout, and what deploy interface is being used.
|
||||
|
||||
What can I do?
|
||||
==============
|
||||
|
||||
Previously in this document, we've already suggested some architectural
|
||||
constraints and limitations, but there are some things that can be done
|
||||
to maximize performance. Again, this will vary greatly depending on your
|
||||
use.
|
||||
|
||||
* Use the ``direct`` deploy interface. This offloads any final image
|
||||
conversion to the host running the ``ironic-python-agent``. Additionally,
|
||||
if Swift or other object storage such as RadosGW is used, downloads can
|
||||
be completely separated from the host running the ``ironic-conductor``.
|
||||
* Use small/compact "raw" images. Qcow2 files are generally compressed
|
||||
and require substantial amounts of memory to decompress and stream.
|
||||
* Tune the internal memory limit for the conductor using the
|
||||
``[DEFAULT]memory_required_minimum`` setting. This will help the conductor
|
||||
throttle back memory intensive operations. The default should prevent
|
||||
Out-of-Memory operations, but under extreme memory pressure this may
|
||||
still be sub-optimal. Before changing this setting, it is highly advised
|
||||
to consult with your resident "Unix wizard" or even the Ironic
|
||||
development team in upstream IRC. This feature was added in the Wallaby
|
||||
development cycle.
|
||||
* If network bandwidth is the problem you are seeking to solve for, you may
|
||||
wish to explore a mix of the ``direct`` deploy interface and caching
|
||||
proxies. Such a configuration can be highly beneficial in wide area
|
||||
deployments. See :ref:`Using proxies for image download <ipa-proxies>`.
|
Loading…
Reference in New Issue
Block a user