Add an article about profiling into the docs

Change-Id: I79befa91fcaf393ec19498372f1e6c6953151867
This commit is contained in:
Renat Akhmerov 2020-02-18 16:12:06 +07:00
parent 0a3880c22c
commit cc825ebcd3
2 changed files with 299 additions and 0 deletions

View File

@ -7,5 +7,6 @@ Contributor Documentation
coding_guidelines
debugging_and_testing
profiling
troubleshooting
devstack

View File

@ -0,0 +1,298 @@
Profiling Mistral
=================
What Is Profiling?
------------------
Profiling is a procedure for gathering runtime statistics about certain code
snippets like:
- The maximum run time
- The minimum run time
- The average run time
- The number of runs
Such info is a key to understanding performance bottlenecks residing in
a system. Having these metrics, we can focus on places in code that slow
down the system most and come up with optimisations to improve them.
A typical code snippet eligible for gathering this kind of information is a
function or a method since, most popular engineering techniques encourage
developers to decompose code into functions/methods representing well defined
parts of program logic. However, any arbitrary piece of code may be a target
for measuring.
'osprofiler' Project
--------------------
`osprofiler <https://osprofiler.readthedocs.io/en/latest/>`_ is a project
created within the OpenStack ecosystem to do profiling. The paragraphs below
explain how Mistral uses 'osprofiler' for profiling. The central concept of
'osprofiler' is a profile trace. A developer can mark code snippets with
profiler traces and 'osprofiler' will be tracking them. In general,
'osprofiler' allows cross-service profiling, that is, tracking a chain of
calls that belong to different RESTful services but related with the same
user request. However, this guide doesn't cover this more complex use case
and focus on profiling within just one service, Mistral.
Profiler Traces
---------------
The most common way to create a profiler trace in the code is adding a
special ''@trace"
.. code-block:: python
from osprofiler import profiler
class DefaultEngine(base.Engine):
...
@profiler.trace('engine-on-action-complete', hide_args=True)
def on_action_complete(self, action_ex_id, result, wf_action=False,
async_=False):
with db_api.transaction():
if wf_action:
action_ex = db_api.get_workflow_execution(action_ex_id)
if result is None:
result = ml_actions.Result(data=action_ex.output)
else:
action_ex = db_api.get_action_execution(action_ex_id)
action_handler.on_action_complete(action_ex, result)
return action_ex.get_clone()
In this example, we applied a special decorator to a method that adds a
profiling trace. The most important argument of the decorator is a trace
name. Its value is 'engine-on-action-complete' in our case. The second
argument 'hide_args' defines whether 'osprofiler' needs to pass method
argument values down to other layers. More specifically, there's a notion
metrics collector in 'osprofiler' that accumulates info about traces
in any desirable form, it depends on a particular implementation. This
topic though is out of the scope of this document. For our purposes, it's
better to set this argument to **True** which will not lead to loosing
performance on processing additional data (argument values of all method
calls).
Another way of adding a profiling trace is the following:
.. code-block:: python
try:
profiler.start("engine-on-action-complete")
action_handler.on_action_complete(action_ex, result)
finally:
profiler.stop()
Here we don't decorate the entire method, we only want to profile just one
line of code. But like in the previous example, we added a profiling trace.
The obvious advantage of using the decorator is that it can live in code
permanently because it doesn't pollute it too much and we can use them any
time we want to profile the system.
Even simpler and more concise way to achieve the same is use a special
context manager from 'osprofiler':
.. code-block:: python
with profiler.Trace('engine-on-action-complete'):
action_handler.on_action_complete(action_ex, result)
Configuring Mistral for Profiling
---------------------------------
To start a profiling session, one needs to make the steps below.
Mistral Configuration File
^^^^^^^^^^^^^^^^^^^^^^^^^^
Make these change in the config file:
.. code-block:: cfg
[DEFAULT]
log_config_append = wf_trace_logging.conf
[profiler]
enabled = True
hmac_keys = secret_word
Defining the 'log_config_append' property allows to have all the logging
configuration in a separate file. In the example above, it's called
'wf_trace_logging.conf' but it can have a different name, if needed.
'[profiler]' group directly refers to the 'osprofiler' project and is
brought by it. The property 'enabled' is self-explaining, but the other one
is not. The value of the property 'hmac_keys' basically needs to be known
by someone who wants to start a profiling session. This value needs to be
passed as part of the user request. It will be shown a bit later.
Logging Configuration File
^^^^^^^^^^^^^^^^^^^^^^^^^^
The content of the logging configuration file conforms the documentation for
the standard 'logging' Python module. Find more details at
https://docs.python.org/3/library/logging.config.html#configuration-file-format
This particular example of the logging file configures three different loggers
and their corresponding counterparts like handlers. For the purpose of this
document though we only need to pay attention how 'profiler_trace' logger is
configure. Every entity starting with 'profiler' is related to profiling
configuration. The reason why other loggers are also included here is to show
how different loggers can coexist within one configuration file and how they
can reuse same entities.
.. code-block:: cfg
[loggers]
keys=workflow_trace,profiler_trace,root
[handlers]
keys=consoleHandler, wfTraceFileHandler, profilerFileHandler, fileHandler
[formatters]
keys=wfFormatter, profilerFormatter, simpleFormatter, verboseFormatter
[logger_workflow_trace]
level=INFO
handlers=consoleHandler, wfTraceFileHandler
qualname=workflow_trace
propagate=0
[logger_profiler_trace]
level=INFO
handlers=profilerFileHandler
qualname=profiler_trace
[logger_root]
level=DEBUG
handlers=fileHandler
[handler_fileHandler]
class=FileHandler
level=DEBUG
formatter=verboseFormatter
args=("/tmp/mistral.log",)
[handler_consoleHandler]
class=StreamHandler
level=INFO
formatter=simpleFormatter
args=(sys.stdout,)
[handler_wfTraceFileHandler]
class=FileHandler
level=INFO
formatter=wfFormatter
args=("/tmp/mistral_wf_trace.log",)
[handler_profilerFileHandler]
class=FileHandler
level=INFO
formatter=profilerFormatter
args=("/tmp/mistral_osprofile.log",)
[formatter_verboseFormatter]
format=%(asctime)s %(thread)s %(levelname)s %(module)s [-] %(message)s
datefmt=
[formatter_simpleFormatter]
format=%(asctime)s - %(message)s
datefmt=%y-%m-%d %H:%M:%S
[formatter_wfFormatter]
format=%(asctime)s WF [-] %(message)s
datefmt=
[formatter_profilerFormatter]
format=%(message)s
datefmt=%H:%M:%S
Triggering Profiling Sessions
-----------------------------
Once Mistral is configured like explained above, in order to start a
profiling session we need to make a user request to Mistral that we
want to analyse but adding one property to it. The name of the property
is 'profile' and it needs to be set to the value of the 'hmac_keys'
property from the main configuration file.
.. code-block:: bash
$ mistral execution-create my_slow_workflow --profile secret_word
Profiling Session Result
------------------------
When started in a profiling mode like just shown, Mistral will be writing
info about the profiling traces into the configured file. In our case it is
'/tmp/mistral_osprofile.log'.
.. code-block:: cfg
2020-02-27T08:04:25.789433 f12e75d5-5d59-4cbc-b74d-357f19290dd7 f12e75d5-5d59-4cbc-b74d-357f19290dd7 b9b29981-0916-4635-af18-d6c92f991f46 engine-start-workflow-start
2020-02-27T08:04:25.790232 f12e75d5-5d59-4cbc-b74d-357f19290dd7 b9b29981-0916-4635-af18-d6c92f991f46 3cdd41b5-318a-4926-a38e-63344b6aef7a workflow-handler-start-workflow-start
2020-02-27T08:04:25.812879 f12e75d5-5d59-4cbc-b74d-357f19290dd7 3cdd41b5-318a-4926-a38e-63344b6aef7a 603f1fab-be78-438d-af13-d94ed3b7e416 workflow-start-start
2020-02-27T08:04:25.954502 f12e75d5-5d59-4cbc-b74d-357f19290dd7 603f1fab-be78-438d-af13-d94ed3b7e416 b1d0a77a-52f5-4415-a6c4-f16b3591a47d workflow-set-state-start
2020-02-27T08:04:25.961298 0.006782 f12e75d5-5d59-4cbc-b74d-357f19290dd7 603f1fab-be78-438d-af13-d94ed3b7e416 b1d0a77a-52f5-4415-a6c4-f16b3591a47d workflow-set-state-stop
2020-02-27T08:04:25.961769 f12e75d5-5d59-4cbc-b74d-357f19290dd7 603f1fab-be78-438d-af13-d94ed3b7e416 27b58351-aebe-4e37-9cec-91fdbef5c68b wf-controller-get-controller-start
2020-02-27T08:04:25.962041 0.000267 f12e75d5-5d59-4cbc-b74d-357f19290dd7 603f1fab-be78-438d-af13-d94ed3b7e416 27b58351-aebe-4e37-9cec-91fdbef5c68b wf-controller-get-controller-stop
2020-02-27T08:04:25.962311 f12e75d5-5d59-4cbc-b74d-357f19290dd7 603f1fab-be78-438d-af13-d94ed3b7e416 605ebfc2-a2bb-4fe1-8159-fc16f6741f5f workflow-controller-continue-workflow-start
2020-02-27T08:04:26.023134 0.060832 f12e75d5-5d59-4cbc-b74d-357f19290dd7 603f1fab-be78-438d-af13-d94ed3b7e416 605ebfc2-a2bb-4fe1-8159-fc16f6741f5f workflow-controller-continue-workflow-stop
2020-02-27T08:04:26.023600 f12e75d5-5d59-4cbc-b74d-357f19290dd7 603f1fab-be78-438d-af13-d94ed3b7e416 3a5a384a-9598-4844-a740-981f92e604af dispatcher-dispatch-commands-start
2020-02-27T08:04:26.023918 f12e75d5-5d59-4cbc-b74d-357f19290dd7 3a5a384a-9598-4844-a740-981f92e604af d84a13e4-4763-4321-ab08-8cbd19656f2f task-handler-run-task-start
2020-02-27T08:04:26.024179 f12e75d5-5d59-4cbc-b74d-357f19290dd7 d84a13e4-4763-4321-ab08-8cbd19656f2f 7878e4f8-aaaa-4b9b-b15a-35848b5cdd61 task-handler-build-task-from-command-start
2020-02-27T08:04:26.024422 0.000243 f12e75d5-5d59-4cbc-b74d-357f19290dd7 d84a13e4-4763-4321-ab08-8cbd19656f2f 7878e4f8-aaaa-4b9b-b15a-35848b5cdd61 task-handler-build-task-from-command-stop
So any time Mistral runs code marked as a profiling trace it prints two
entries into the file: right before the code snippet starts and right
after its completion. Notice also that for the corresponding "-stop" entry
(the suffix going after the trace name) Mistral prints an additional number
in the second column. This is a duration of the code snippet.
This content of this file itself is probably not so useful (although, it
might be for some purpose) but based on it we can build the following
report:
.. code-block:: bash
Total time | Max time | Avg time | Occurrences | Trace name
-------------------------------------------------------------------------------------------
2948.326 8.612 1.218 2420 engine-on-action-complete
2859.172 8.516 1.181 2420 action-handler-on-action-complete
2812.726 8.482 1.162 2420 task-handler-on-action-complete
2767.836 8.412 1.144 2420 regular-task-on-action-complete
2766.199 8.411 1.143 2420 task-complete
2702.764 8.351 0.460 5878 task-run
2506.531 8.354 0.850 2948 dispatcher-dispatch-commands
2503.398 8.353 0.437 5735 task-handler-run-task
2488.940 8.350 0.434 5735 task-run-new
1669.179 54.737 0.881 1894 default-executor-run-action
1201.582 3.687 0.497 2420 regular-task-get-action-input
1126.351 2.093 0.476 2366 ad-hoc-action-validate-input
1125.129 2.092 0.238 4732 ad-hoc-action-prepare-input
687.619 7.594 0.651 1056 task-handler-refresh-task-state
387.622 3.872 0.300 1291 workflow-handler-check-and-fix-integrity
234.231 4.068 0.392 597 workflow-handler-check-and-complete
224.026 4.042 0.375 597 workflow-check-and-complete
210.184 6.694 1.470 143 task-run-existing
160.118 8.343 0.304 526 workflow-action-schedule
141.398 4.546 0.268 528 workflow-handler-start-workflow
109.641 4.361 0.208 528 workflow-start
78.683 2.004 0.077 1024 direct-wf-controller-get-join-logical-state
...
To generate this report, run:
.. code-block:: bash
$ python tools/rank_profiled_methods.py /tmp/mistral_osprofile.log report.txt
And this report is somewhat really useful when it comes to analysing
performance bottlenecks. All times are shown in seconds.