A C++ Workflow Management System for e-scienze

Logical Architecture

A WfMS is a software component that takes as input a formal description of processes and maintains the state of processes executions, thereby delegating activities amongst people and applications. The process description is provided by means of a Workflow Description Lanaguge, which will be discussed in the next section. An outline of the CppWfMS architecture is shown in Fig. 1. In Figure 1, the architecture of the CppWfMS is depicted:

Fig. 1

The use of a layered architecture makes it possible to abstract both from a particular Grid infrastructure and a workflow language in order to provide portability and multi-language compatibility. At the bottom lies the basic Grid infrastructure: a collection of computational and storage resources. These resources are transparent to users thanks to a so called Grid middleware which acts as a mediator that provides a consistent and homogeneous access to them.

Since multiple Grid infrastructures still exist, a Grid Abstraction Layer is introduced in order to abstract high level Grid functionalities such as job submission, data transfer, job state observation and resource reservation. This makes it possible to decouple the workflow engine from the underlying grid architecture allowing workflows to use a large set of Grid infrastructures and therefore resources.

The workflow engine is the main component of the WfMS; it basically submits tasks to the Grid taking care of their dependencies and the overall workflow execution. The engine we are going to propose in this paper executes workflows represented in term of Petri Nets.

The top layer aims at language independence. The basic idea is to make the workflow engine compatible with a large set of workflow languages. A pluggable system of parsers provides support for several languages in order to allow collaboration between WfMSs and support for legacy workflows. As we will see in more detail in the next section this layer has the responsibility to extrapolate the semantics behind a workflow description and translate it in terms of a Petri Net. On the other side the gateway makes it possible to transfer parts of a process to a different WfMS.

Workflow Gateway

Scientific workflows lack a recognized standard for process description; as a consequence several workflow languages exist. Apart from the syntax, these languages differ for the formalism used to express the workflow model. Most of the graphical workflow languages are based on Directed Acyclic Graphs (DAGs) where the control flow can be described in terms of sequence, parallelism and choice. More powerful than DAGs, formalisms such as Petri Nets and Pi-Calculus allow to define iteration (also know as loop or cycle). As a consequence of that variety of languages and formalisms, WfMSs are incompatible.

However, translation from one language to another is often possible; what is needed is a language parser, a model translator and a compiler, as shown in the top layer of Fig. 1. Parsers have the responsibility to extract the workflow semantics from a description, expressed using a workflow language. As explained before, workflows can be described in terms of DAGs, Petri Nets, Pi-Calculus or Activity Diagrams, albeit with different expressivity levels (e.g. a DAG cannot describes an iteration of tasks). Conversion between these formalisms is provided by the model converter which represents the critical part of such process. In fact is not always possible to rapresent one model in terms of a different one; for example a Petri Net cannot always be converted in term of a DAG. Finally compilers translate the model into a specific (usually different from the initial) language description.

A debate exists around the best formalism to use for workflow description; Petri Nets and Pi-Calculus are widely used for workflow modeling. As far as both the formalisms are Turing-complete, the choice relies on the way these models deal with workflow patterns. Workflow patterns are a collection of well-known problems, and solutions, related to the support of process-oriented applications. According to recent studies, Petri Nets outperform other formalisms in workflow description thanks to their formal semantics; also several analysis techniques exist in order to determine the properties (correctness, deadlocks and boundary) of a process design. For those reasons we decided to use the Petri Nets formalism as internal representation for workflows. This choice is also compatible with the one done by other CoreGRID partners like the Fraunhofer FIRST which introduced an XML based language called GWorkflowDL that allows the representation of abstract and concrete workflows using Petri Nets (more details will be provided in the next section).

Compilers come into stage when a workflow model, or a part of it, needs to be represented using a specific language. For example a part of a process, usually a sub-workflow, can be transferred to a different WfMS; the internal Petri Nets representation must be converted in a language description the third-party WfMS understands.

Workflow Engine

The engine is the core component of a WfMS; the CppWfMS project implements a Petri Nets-based workflow engine. The engine has the responsability to execute workflows written in the GWorkflowDL language. It takes care of the dependencies of the workflow activities scheduling them to the underlying Grid middleware. The engine implementation has to deal with several proleblems related to the non-deterministic behaviour of the Petri Nets. In fact, the Turing-based environment provided by the mainstream imperative languages (such as C/C++, Java and C#) are not able to deal with the non-determinism. The design choices and the architectural details about the engine will be discussed in this section.

Workflow Description Language »

CppWfMS

A C++ Workflow Management System for e-science

Logical Architecture

Workflow Gateway

Workflow Engine