taskflow/doc/source/engines.rst
Joshua Harlow 46d30eeb29 Move implementations into there own sub-sections
Also fixes up some inline-code/examples docs to
correctly display in the generated docs (and
tweaks some URI capitalization).

Change-Id: I001ef2460eb5e9a884ca6db6e8d6f72864191fe7
2015-05-01 17:53:41 -07:00

20 KiB

Engines

Overview

Engines are what really runs your atoms.

An engine takes a flow structure (described by patterns <patterns>) and uses it to decide which atom <atoms> to run and when.

TaskFlow provides different implementations of engines. Some may be easier to use (ie, require no additional infrastructure setup) and understand; others might require more complicated setup but provide better scalability. The idea and ideal is that deployers or developers of a service that use TaskFlow can select an engine that suites their setup best without modifying the code of said service.

Note

Engines usually have different capabilities and configuration, but all of them must implement the same interface and preserve the semantics of patterns (e.g. parts of a :py.linear_flow.Flow are run one after another, in order, even if the selected engine is capable of running tasks in parallel).

Why they exist

An engine being the core component which actually makes your flows progress is likely a new concept for many programmers so let's describe how it operates in more depth and some of the reasoning behind why it exists. This will hopefully make it more clear on their value add to the TaskFlow library user.

First though let us discuss something most are familiar already with; the difference between declarative and imperative programming models. The imperative model involves establishing statements that accomplish a programs action (likely using conditionals and such other language features to do this). This kind of program embeds the how to accomplish a goal while also defining what the goal actually is (and the state of this is maintained in memory or on the stack while these statements execute). In contrast there is the declarative model which instead of combining the how to accomplish a goal along side the what is to be accomplished splits these two into only declaring what the intended goal is and not the how. In TaskFlow terminology the what is the structure of your flows and the tasks and other atoms you have inside those flows, but the how is not defined (the line becomes blurred since tasks themselves contain imperative code, but for now consider a task as more of a pure function that executes, reverts and may require inputs and provide outputs). This is where engines get involved; they do the execution of the what defined via atoms <atoms>, tasks, flows and the relationships defined there-in and execute these in a well-defined manner (and the engine is responsible for any state manipulation instead).

This mix of imperative and declarative (with a stronger emphasis on the declarative model) allows for the following functionality to become possible:

  • Enhancing reliability: Decoupling of state alterations from what should be accomplished allows for a natural way of resuming by allowing the engine to track the current state and know at which point a workflow is in and how to get back into that state when resumption occurs.
  • Enhancing scalability: When an engine is responsible for executing your desired work it becomes possible to alter the how in the future by creating new types of execution backends (for example the worker model which does not execute locally). Without the decoupling of the what and the how it is not possible to provide such a feature (since by the very nature of that coupling this kind of functionality is inherently very hard to provide).
  • Enhancing consistency: Since the engine is responsible for executing atoms and the associated workflow, it can be one (if not the only) of the primary entities that is working to keep the execution model in a consistent state. Coupled with atoms which should be immutable and have have limited (if any) internal state the ability to reason about and obtain consistency can be vastly improved.
    • With future features around locking (using tooz to help) engines can also help ensure that resources being accessed by tasks are reliably obtained and mutated on. This will help ensure that other processes, threads, or other types of entities are also not executing tasks that manipulate those same resources (further increasing consistency).

Of course these kind of features can come with some drawbacks:

  • The downside of decoupling the how and the what is that the imperative model where functions control & manipulate state must start to be shifted away from (and this is likely a mindset change for programmers used to the imperative model). We have worked to make this less of a concern by creating and encouraging the usage of persistence <persistence>, to help make it possible to have state and tranfer that state via a argument input and output mechanism.
  • Depending on how much imperative code exists (and state inside that code) there may be significant rework of that code and converting or refactoring it to these new concepts. We have tried to help here by allowing you to have tasks that internally use regular python code (and internally can be written in an imperative style) as well as by providing examples <examples> that show how to use these concepts.
  • Another one of the downsides of decoupling the what from the how is that it may become harder to use traditional techniques to debug failures (especially if remote workers are involved). We try to help here by making it easy to track, monitor and introspect the actions & state changes that are occurring inside an engine (see notifications <notifications> for how to use some of these capabilities).

Creating

All engines are mere classes that implement the same interface, and of course it is possible to import them and create instances just like with any classes in Python. But the easier (and recommended) way for creating an engine is using the engine helper functions. All of these functions are imported into the taskflow.engines module namespace, so the typical usage of these functions might look like:

from taskflow import engines

...
flow = make_flow()
eng = engines.load(flow, engine='serial', backend=my_persistence_conf)
eng.run()
...

taskflow.engines.helpers

Usage

To select which engine to use and pass parameters to an engine you should use the engine parameter any engine helper function accepts and for any engine specific options use the kwargs parameter.

Types

Serial

Engine type: 'serial'

Runs all tasks on a single thread -- the same thread engine.run() is called from.

Note

This engine is used by default.

Tip

If eventlet is used then this engine will not block other threads from running as eventlet automatically creates a implicit co-routine system (using greenthreads and monkey patching). See eventlet and greenlet for more details.

Parallel

Engine type: 'parallel'

A parallel engine schedules tasks onto different threads/processes to allow for running non-dependent tasks simultaneously. See the documentation of :py~taskflow.engines.action_engine.engine.ParallelActionEngine for supported arguments that can be used to construct a parallel engine that runs using your desired execution model.

Tip

Sharing an executor between engine instances provides better scalability by reducing thread/process creation and teardown as well as by reusing existing pools (which is a good practice in general).

Warning

Running tasks with a process pool executor is experimentally supported. This is mainly due to the futures backport and the multiprocessing module that exist in older versions of python not being as up to date (with important fixes such as 4892, 6721, 9205, 16284, 22393 and others...) as the most recent python version (which themselves have a variety of ongoing/recent bugs).

Workers

Engine type: 'worker-based' or 'workers'

Note

Since this engine is significantly more complicated (and different) then the others we thought it appropriate to devote a whole documentation section to it.

For further information, please refer to the the following:

workers

How they run

To provide a peek into the general process that an engine goes through when running lets break it apart a little and describe what one of the engine types does while executing (for this we will look into the :py~taskflow.engines.action_engine.engine.ActionEngine engine type).

Creation

The first thing that occurs is that the user creates an engine for a given flow, providing a flow detail (where results will be saved into a provided persistence <persistence> backend). This is typically accomplished via the methods described above in creating engines. The engine at this point now will have references to your flow and backends and other internal variables are setup.

Compiling

During this stage (see :py~taskflow.engines.base.Engine.compile) the flow will be converted into an internal graph representation using a compiler (the default implementation for patterns is the :py~taskflow.engines.action_engine.compiler.PatternCompiler). This class compiles/converts the flow objects and contained atoms into a networkx directed graph (and tree structure) that contains the equivalent atoms defined in the flow and any nested flows & atoms as well as the constraints that are created by the application of the different flow patterns. This graph (and tree) are what will be analyzed & traversed during the engines execution. At this point a few helper object are also created and saved to internal engine variables (these object help in execution of atoms, analyzing the graph and performing other internal engine activities). At the finishing of this stage a :py~taskflow.engines.action_engine.runtime.Runtime object is created which contains references to all needed runtime components.

Preparation

This stage (see :py~taskflow.engines.base.Engine.prepare) starts by setting up the storage needed for all atoms in the compiled graph, ensuring that corresponding :py~taskflow.persistence.logbook.AtomDetail (or subclass of) objects are created for each node in the graph.

Validation

This stage (see :py~taskflow.engines.base.Engine.validate) performs any final validation of the compiled (and now storage prepared) engine. It compares the requirements that are needed to start execution and what is currently provided or will be produced in the future. If there are any atom requirements that are not satisfied (no known current provider or future producer is found) then execution will not be allowed to continue.

Execution

The graph (and helper objects) previously created are now used for guiding further execution (see :py~taskflow.engines.base.Engine.run). The flow is put into the RUNNING state <states> and a :py~taskflow.engines.action_engine.runner.Runner implementation object starts to take over and begins going through the stages listed below (for a more visual diagram/representation see the engine state diagram <engine states>).

Note

The engine will respect the constraints imposed by the flow. For example, if an engine is executing a :py~taskflow.patterns.linear_flow.Flow then it is constrained by the dependency graph which is linear in this case, and hence using a parallel engine may not yield any benefits if one is looking for concurrency.

Resumption

One of the first stages is to analyze the state <states> of the tasks in the graph, determining which ones have failed, which one were previously running and determining what the intention of that task should now be (typically an intention can be that it should REVERT, or that it should EXECUTE or that it should be IGNORED). This intention is determined by analyzing the current state of the task; which is determined by looking at the state in the task detail object for that task and analyzing edges of the graph for things like retry atom which can influence what a tasks intention should be (this is aided by the usage of the :py~taskflow.engines.action_engine.analyzer.Analyzer helper object which was designed to provide helper methods for this analysis). Once these intentions are determined and associated with each task (the intention is also stored in the :py~taskflow.persistence.logbook.AtomDetail object) the scheduling <scheduling> stage starts.

Scheduling

This stage selects which atoms are eligible to run by using a :py~taskflow.engines.action_engine.scheduler.Scheduler implementation (the default implementation looks at their intention, checking if predecessor atoms have ran and so-on, using a :py~taskflow.engines.action_engine.analyzer.Analyzer helper object as needed) and submits those atoms to a previously provided compatible executor for asynchronous execution. This :py~taskflow.engines.action_engine.scheduler.Scheduler will return a future object for each atom scheduled; all of which are collected into a list of not done futures. This will end the initial round of scheduling and at this point the engine enters the waiting <waiting> stage.

Waiting

In this stage the engine waits for any of the future objects previously submitted to complete. Once one of the future objects completes (or fails) that atoms result will be examined and finalized using a :py~taskflow.engines.action_engine.completer.Completer implementation. It typically will persist results to a provided persistence backend (saved into the corresponding :py~taskflow.persistence.logbook.AtomDetail and :py~taskflow.persistence.logbook.FlowDetail objects via the :py~taskflow.storage.Storage helper) and reflect the new state of the atom. At this point what typically happens falls into two categories, one for if that atom failed and one for if it did not. If the atom failed it may be set to a new intention such as RETRY or REVERT (other atoms that were predecessors of this failing atom may also have there intention altered). Once this intention adjustment has happened a new round of scheduling <scheduling> occurs and this process repeats until the engine succeeds or fails (if the process running the engine dies the above stages will be restarted and resuming will occur).

Note

If the engine is suspended while the engine is going through the above stages this will stop any further scheduling stages from occurring and all currently executing work will be allowed to finish (see suspension <suspension>).

Finishing

At this point the :py~taskflow.engines.action_engine.runner.Runner has now finished successfully, failed, or the execution was suspended. Depending on which one of these occurs will cause the flow to enter a new state (typically one of FAILURE, SUSPENDED, SUCCESS or REVERTED). Notifications <notifications> will be sent out about this final state change (other state changes also send out notifications) and any failures that occurred will be reraised (the failure objects are wrapped exceptions). If no failures have occurred then the engine will have finished and if so desired the persistence <persistence> can be used to cleanup any details that were saved for this execution.

Special cases

Suspension

Each engine implements a :py~taskflow.engines.base.Engine.suspend method that can be used to externally (or in the future internally) request that the engine stop scheduling <scheduling> new work. By default what this performs is a transition of the flow state from RUNNING into a SUSPENDING state (which will later transition into a SUSPENDED state). Since an engine may be remotely executing atoms (or locally executing them) and there is currently no preemption what occurs is that the engines :py~taskflow.engines.action_engine.runner.Runner state machine will detect this transition into SUSPENDING has occurred and the state machine will avoid scheduling new work (it will though let active work continue). After the current work has finished the engine will transition from SUSPENDING into SUSPENDED and return from its :py~taskflow.engines.base.Engine.run method.

Note

When :py~taskflow.engines.base.Engine.run is returned from at that point there may (but does not have to be, depending on what was active when :py~taskflow.engines.base.Engine.suspend was called) be unfinished work in the flow that was not finished (but which can be resumed at a later point in time).

Scoping

During creation of flows it is also important to understand the lookup strategy (also typically known as scope resolution) that the engine you are using will internally use. For example when a task A provides result 'a' and a task B after A provides a different result 'a' and a task C after A and after B requires 'a' to run, which one will be selected?

Default strategy

When an engine is executing it internally interacts with the :py~taskflow.storage.Storage class and that class interacts with the a :py~taskflow.engines.action_engine.scopes.ScopeWalker instance and the :py~taskflow.storage.Storage class uses the following lookup order to find (or fail) a atoms requirement lookup/request:

  1. Transient injected atom specific arguments.
  2. Non-transient injected atom specific arguments.
  3. Transient injected arguments (flow specific).
  4. Non-transient injected arguments (flow specific).
  5. First scope visited provider that produces the named result; note that if multiple providers are found in the same scope the first (the scope walkers yielded ordering defines what first means) that produced that result and can be extracted without raising an error is selected as the provider of the requested requirement.
  6. Fails with :py~taskflow.exceptions.NotFound if unresolved at this point (the cause attribute of this exception may have more details on why the lookup failed).

Note

To examine this this information when debugging it is recommended to enable the BLATHER logging level (level 5). At this level the storage and scope code/layers will log what is being searched for and what is being found.

Interfaces

taskflow.engines.base

Implementations

taskflow.engines.action_engine.engine

Components

taskflow.engines.action_engine.analyzer

taskflow.engines.action_engine.compiler

taskflow.engines.action_engine.completer

taskflow.engines.action_engine.executor

taskflow.engines.action_engine.runner

taskflow.engines.action_engine.runtime

taskflow.engines.action_engine.scheduler

taskflow.engines.action_engine.scopes.ScopeWalker

Hierarchy

taskflow.engines.action_engine.engine.ActionEngine taskflow.engines.base.Engine taskflow.engines.worker_based.engine.WorkerBasedActionEngine