Merge "Add some tuning documentation"
This commit is contained in:
commit
71ebba5cf3
@ -44,6 +44,8 @@ Requirements
|
|||||||
Using IPA requires it to be present and configured on the deploy ramdisk, see
|
Using IPA requires it to be present and configured on the deploy ramdisk, see
|
||||||
:ref:`deploy-ramdisk`
|
:ref:`deploy-ramdisk`
|
||||||
|
|
||||||
|
.. _ipa-proxies:
|
||||||
|
|
||||||
Using proxies for image download
|
Using proxies for image download
|
||||||
================================
|
================================
|
||||||
|
|
||||||
|
@ -53,6 +53,7 @@ Advanced Topics
|
|||||||
Agent Token <agent-token>
|
Agent Token <agent-token>
|
||||||
Deploying without BMC Credentials <agent-power>
|
Deploying without BMC Credentials <agent-power>
|
||||||
Layer 3 or DHCP-less Ramdisk Booting <dhcp-less>
|
Layer 3 or DHCP-less Ramdisk Booting <dhcp-less>
|
||||||
|
Tuning Ironic <tuning>
|
||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
:hidden:
|
:hidden:
|
||||||
|
141
doc/source/admin/tuning.rst
Normal file
141
doc/source/admin/tuning.rst
Normal file
@ -0,0 +1,141 @@
|
|||||||
|
=============
|
||||||
|
Tuning Ironic
|
||||||
|
=============
|
||||||
|
|
||||||
|
Memory Utilization
|
||||||
|
==================
|
||||||
|
|
||||||
|
Memory utilization is a difficult thing to tune in Ironic as largely we may
|
||||||
|
be asked by API consumers to perform work for which the underlying tools
|
||||||
|
require large amounts of memory.
|
||||||
|
|
||||||
|
The biggest example of this is image conversion. Images not in a raw format
|
||||||
|
need to be written out to disk (local files or remote in iscsi deploy) which
|
||||||
|
requires the conversion process to generate an in-memory map to re-assemble
|
||||||
|
the image contents into a coherent stream of data. This entire process also
|
||||||
|
stresses the kernel buffers and cache.
|
||||||
|
|
||||||
|
This ultimately comes down to a trade-off of Memory versus Performance,
|
||||||
|
similar to the trade-off of Performance versus Cost.
|
||||||
|
|
||||||
|
On a plus side, an idle Ironic deployment does not need much in the way
|
||||||
|
of memory. On the down side, a highly bursty environment where a large
|
||||||
|
number of concurrent deployments may be requested should consider two
|
||||||
|
aspects:
|
||||||
|
|
||||||
|
* How is the ironic-api service/process set up? Will more
|
||||||
|
processes be launched automatically?
|
||||||
|
* Are images prioritized for storage size on disk? Or are they compressed and
|
||||||
|
require format conversion?
|
||||||
|
|
||||||
|
API
|
||||||
|
===
|
||||||
|
|
||||||
|
Ironic's API should have a fairly stable memory footprint with activity,
|
||||||
|
however depending on how the webserver is running the API, additional
|
||||||
|
processes can be launched.
|
||||||
|
|
||||||
|
Under normal conditions, as of Ironic 15.1, the ``ironic-api`` service/process
|
||||||
|
consumes approximately 270MB of memory per worker. Depending on how the
|
||||||
|
process is being launched, the number of workers and maximum request threads
|
||||||
|
per worker may differ. Naturally there are configuration and performance
|
||||||
|
trade-offs.
|
||||||
|
|
||||||
|
* Directly as a native python process, i.e. execute ``ironic-api``
|
||||||
|
processes. Each single worker allows for multiple requests to be handled
|
||||||
|
and threaded at the same time which can allow high levels of request
|
||||||
|
concurrency. As of the Victoria cycle, a direct invocation of the
|
||||||
|
``ironic-api`` program will only launch a maximum of four workers.
|
||||||
|
* Launched via a wrapper such as Apache+uWSGI may allow for multiple distinct
|
||||||
|
worker processes, but these workers typically limit the number of request
|
||||||
|
processing threads that are permitted to execute. This means requests can
|
||||||
|
stack up in the front-end webserver and be released to the ``ironic-api``
|
||||||
|
as prior requests complete. In environments with long running synchronous
|
||||||
|
calls, such as use of the vendor passthru interface, this can be very
|
||||||
|
problematic.
|
||||||
|
|
||||||
|
When the webserver is launched by the API process directly, the default is
|
||||||
|
based upon the number of CPU sockets in your machine.
|
||||||
|
|
||||||
|
When launching using uwsgi, this will entirely vary upon your configuration,
|
||||||
|
but balancing workers/threads based upon your load and needs is highly
|
||||||
|
advisable. Each worker process is unique and consumes far more memory than
|
||||||
|
a comparable number of worker threads. At the same time, the scheduler will
|
||||||
|
focus on worker processes as the threads are greenthreads.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
Host operating systems featuring in-memory de-duplication should see
|
||||||
|
an improvement in the overall memory footprint with multiple processes,
|
||||||
|
but this is not something the development team has measured and will vary
|
||||||
|
based upon multiple factors.
|
||||||
|
|
||||||
|
One important item to note: each Ironic API service/process *does* keep a
|
||||||
|
copy of the hash ring as generated from the database *in-memory*. This is
|
||||||
|
done to help allocate load across a cluster in-line with how individual nodes
|
||||||
|
and their responsible conductors are allocated across the cluster.
|
||||||
|
In other words, your amount of memory WILL increase corresponding to
|
||||||
|
the number of nodes managed by each ironic conductor. It is important
|
||||||
|
to understand that features such as `conductor groups <./conductor-groups.rst>`_
|
||||||
|
means that only matching portions of nodes will be considered for the
|
||||||
|
hash ring if needed.
|
||||||
|
|
||||||
|
Conductor
|
||||||
|
=========
|
||||||
|
|
||||||
|
A conductor process will launch a number of other processes, as required,
|
||||||
|
in order to complete the requested work. Ultimately this means it can quickly
|
||||||
|
consume large amounts of memory because it was asked to complete a substantial
|
||||||
|
amount of work all at once.
|
||||||
|
|
||||||
|
The ``ironic-conductor`` from ironic 15.1 consumes by default about 340MB of
|
||||||
|
RAM in an idle configuration. This process, by default, operates as a single
|
||||||
|
process. Additional processes can be launched, but they must have unique
|
||||||
|
resolvable hostnames and addresses for JSON-RPC or use a central
|
||||||
|
oslo.messaging supported message bus in order for Webserver API to Conductor
|
||||||
|
API communication to be functional.
|
||||||
|
|
||||||
|
Typically, the most memory intensive operation that can be triggered is a
|
||||||
|
image conversion for deployment, which is limited to 1GB of RAM per conversion
|
||||||
|
process.
|
||||||
|
|
||||||
|
Most deployments, by default, do have a concurrency limit depending on their
|
||||||
|
Compute (See `nova.conf <https://docs.openstack.org/nova/latest/configuration/sample-config.html>`_
|
||||||
|
setting ``max_concurrent_builds``) configuration. However, this is only per
|
||||||
|
``nova-compute`` worker, so naturally this concurrency will scale with
|
||||||
|
additional workers.
|
||||||
|
|
||||||
|
Stand-alone users can easily request deployments exceeding the Compute service
|
||||||
|
default maximum concurrent builds. As such, if your environment is used this
|
||||||
|
way, you may wish to carefully consider your deployment architecture.
|
||||||
|
|
||||||
|
With a single nova-compute process talking to a single conductor, asked to
|
||||||
|
perform ten concurrent deployments of images requiring conversion, the memory
|
||||||
|
needed may exceed 10GB. This does however, entirely depend upon image block
|
||||||
|
structure and layout, and what deploy interface is being used.
|
||||||
|
|
||||||
|
What can I do?
|
||||||
|
==============
|
||||||
|
|
||||||
|
Previously in this document, we've already suggested some architectural
|
||||||
|
constraints and limitations, but there are some things that can be done
|
||||||
|
to maximize performance. Again, this will vary greatly depending on your
|
||||||
|
use.
|
||||||
|
|
||||||
|
* Use the ``direct`` deploy interface. This offloads any final image
|
||||||
|
conversion to the host running the ``ironic-python-agent``. Additionally,
|
||||||
|
if Swift or other object storage such as RadosGW is used, downloads can
|
||||||
|
be completely separated from the host running the ``ironic-conductor``.
|
||||||
|
* Use small/compact "raw" images. Qcow2 files are generally compressed
|
||||||
|
and require substantial amounts of memory to decompress and stream.
|
||||||
|
* Tune the internal memory limit for the conductor using the
|
||||||
|
``[DEFAULT]memory_required_minimum`` setting. This will help the conductor
|
||||||
|
throttle back memory intensive operations. The default should prevent
|
||||||
|
Out-of-Memory operations, but under extreme memory pressure this may
|
||||||
|
still be sub-optimal. Before changing this setting, it is highly advised
|
||||||
|
to consult with your resident "Unix wizard" or even the Ironic
|
||||||
|
development team in upstream IRC. This feature was added in the Wallaby
|
||||||
|
development cycle.
|
||||||
|
* If network bandwidth is the problem you are seeking to solve for, you may
|
||||||
|
wish to explore a mix of the ``direct`` deploy interface and caching
|
||||||
|
proxies. Such a configuration can be highly beneficial in wide area
|
||||||
|
deployments. See :ref:`Using proxies for image download <ipa-proxies>`.
|
Loading…
Reference in New Issue
Block a user